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Abstract — A sequential problem in decentralized detection is 
considered. Two observers can make repeated noisy observations 
of a binary hypothesis on the state of the environment. At any 
time, observer 1 can stop and send a final binary message to 
observer 2 or it may continue to take more measurements. Every 
time observer 1 postpones its final message to observer 2, it incurs 
a penalty. Observer 2's operation under two different scenarios is 
explored. In the first scenario, observer 2 waits to receive the final 
message from observer 1 and then starts taking measurements of 
its own. It is then faced with a stopping problem on whether to 
stop and declare a decision on the hypothesis or to continue taking 
measurements. In the second scenario, observer 2 starts taking 
measurements from the beginning. It is then faced with a different 
stopping problem. At any time, observer 2 can decide whether to 
stop and declare a decision on the hypothesis or to continue to 
take more measurements and wait for observer 1 to send its final 
message. Parametric characterization of optimal policies for the 
two observers are obtained under both scenarios. A sequential 
methodology for finding the optimal policies is presented. The 
parametric characterizations are then extended to problem with 
increased communication alphabet for the final message from 
observer 1 to observer 2; and to the case of multiple peripheral 
sensors that each send a single final message to a coordinating 
sensor who makes the final decision on the hypothesis. 



I. Introduction 

Decentralized detection problems are motivated by appli- 
cations in large scale decentralized systems such as sensor 
networks and surveillance networks. In such networks, sensors 
receive different information about the environment but share 
a common objective, for example to detect the presence of 
a target in a surveillance area. Sensors may be allowed to 
communicate but they are constrained to exchange only a 
limited amount of information because of energy constraints, 
data storage and data processing constraints, communication 
constraints etc. 

Decentralized detection problems may be static or sequen- 
tial. In static problems, sensors make a fixed number of 
observations about a hypothesis on the state of the environment 
which is modeled as a random variable H. Sensors may 
transmit a single message (a quantized version of their obser- 
vations) to a fusion center which makes a final decision on H. 
Such problems have been extensively studied since their initial 
formulation in [1] (See the surveys in [2], [3] and references 
therein). In most such formulations, it has been shown that 
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person-by-person optimal decision rules (as defined in [4]) 
for a binary hypothesis detection problem are characterized 
by thresholds on the likelihood ratio (or equivalently on the 
posterior belief on the hypothesis). Under certain conditions 
such as large number of identical sensors, it has been shown 
that it is optimal to use identical quantization rule at all sensors 
([5], [6]). A related information-theoretic formulation with 
constraints on communication from a sensor to a fiision center 
appears in [7]. 

In sequential problems, the number of observations taken by 
the sensors is not fixed a priori. Two distinct formulations have 
been considered for sequential problems. In one formulation, 
at each time instant local/peripheral sensors send a message 
about their observations to a fusion center/coordinator. At 
each time instant, the fusion center decides whether to receive 
more messages or to declare a decision on the hypothesis. 
Thus the fusion center is faced with an optimal stopping 
problem whereas the peripheral sensors are not faced with an 
optimal stopping problem. The case where peripheral sensors 
can only use their current observation and possibly all past 
transmissions of all sensors to decide what message to send 
to the fusion center has been studied in [8]. No positive results 
have been found in the case when sensors remember their past 
observations as well. 

A second formulation may be motivated by situations where 
continuous communication with a fusion center is too costly 
because of the various constraints mentioned earlier In this 
formulation, each sensor locally decides when to stop taking 
more measurements and only sends a final message to a 
fusion center. Each sensor pays a penalty for delaying its final 
decision. The fusion center has to wait to receive the final 
messages from all sensors and then combine them to produce 
a final decision on the hypothesis. Thus, in this formulation, 
each local/peripheral sensor is faced with an optimal stopping 
problem but the coordinator does not have a stopping problem. 
A version of this problem (called the decentralized Wald 
problem) was formulated in [9] and it was shown that at 
each time instant, optimal policies for the peripheral sensors 
are described by two thresholds. The computation of these 
thresholds requires solution of two coupled sets of dynamic 
progrannming equations. Similar results were obtained in a 
continuous time setting in [10]. Although this formulation 
reduces the communication requirements, the final decision 
at the fusion center is made oidy when all sensors have 
sent their messages. In a similar formulation, the problem of 
quickest detection of the change of state of a Markov chain 
was considered in [11]. 

In the problem we consider in this paper, the peripheral 
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sensors as well as a coordinating sensor are faced with optimal 
stopping problems. The peripheral sensors decide locally when 
they want to stop taking measurements and send a final mes- 
sage to a special coordinating sensor, say SO. The coordinating 
sensor SO is faced with a stopping problem of its own. At any 
time, the coordinating sensor SO uses its own measurements 
and the messages it has received so far to make a decision on 
whether to stop and declare a final decision on the hypothesis 
or continue to take more measurements and wait for messages 
from other sensors that have not yet sent a final message. As 
in [9], each sensor (peripheral sensors and the coordinating 
sensor) incurs a penalty for delaying its final message/decision, 
and a cost depending on SO's final decision on the hypothesis 
and the true value of the hypothesis is incurred in the end. 

We first consider a simple two sensor version of this 
problem and obtain a parametric characterization of optimal 
policies. We prove that at each time instant, an optimal policy 
of the peripheral sensor is characterized by at most 4 thresh- 
olds on its posterior belief on the hypothesis; an optimal policy 
of the coordinating sensor is characterized by 2 thresholds (on 
its own posterior belief) that depend on the messages received 
from the peripheral sensor This characterization differs from 
the classical two threshold characterization found in the cen- 
tralized and the decentralized Wald problems ([12], [9]). The 
computation of these optimal thresholds is a difficult problem. 
We present a sequential methodology that decomposes the 
overall optimization problem into several smaller problems 
that may be solved to determine the optimal thresholds at 
each time instant. We extend our results to a problem with 
multiple peripheral sensors that send their final message to 
the coordinating sensor who makes the final decision on the 
hypothesis. We show that qualitative properties of the optimal 
policies of the peripheral sensors and the coordinating sensor 
are same as in the two sensor problem. 

The rest of the paper is organized as follows. In Section [ll] 
we formulate two versions of our problem with two observers. 
We obtain qualitative results on the nature of optimal policies 
for the two sensors in Sections III and IV We present a 
sequential methodology for computing optimal policies in 
Section [V] In Section |VI] we extend our qualitative results to 
infinite horizon analogues of our problem. A generalization 
to more than binary communication alphabet is presented 



in Section VII We extend our results to a multiple sensor 



(more than 2) problem in section VIII Finally, we conclude 
in Section BXl 

Notation: Throughout this paper, Xi-t refers to the sequence 
Xi,X2T.,Xt. Subscripts are used as time index and the 
superscripts are used as the index of the sensor. We use capital 
letters to denote random variable and the corresponding lower 
case letters for their realizations. 

II. Problem formulation 

A. The Model 

Consider a binary hypothesis problem where the true hy- 
pothesis is modeled as a random variable H taking values 
or 1 with known prior probabilities: 

P{H = Q)=p^- PiH=l) = l-pa 



Consider two observers: Observer 1 (01) and Observer 
2 (02). We assume that each observer can make noisy 
observations of the true hypothesis. Conditioned on the 
hypothesis H, the following statements are assumed to be 
true: 

1. The observation of the i*'' observer at time t, (Y^) 
(taking values in the set 3^'), either has a discrete distribution 
(P( or admits a probability density function {ff {.\H)). 

2. Observations of the i*^ observer at different time instants 
are conditionally independent given H. 

3. The observation sequences at the two observers are 
conditionally independent given H. 




Single, Final 
l Transmission, 
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02 



t Final Decision, 



Fig. 1 . Decentralized Detection 

Observer 1 observes the measurement process Y^^, t — 
1,2,.... At any time t, after having observed the sequence 
of observations Y^.f., observer 1 can decide either to stop and 
send a binary message or 1 to observer 2 or to postpone its 
decision and get another measurement. Each time observer 1 
postpones its decision, a cost of is incurred. (The cost 
incorporates the additional cost of taking a new measurement, 
the energy cost of staying on for another time step and/or 
a penalty for delaying the decision.) Note that observer 1 
transmits only a single final binary message to observer 2. The 
decision of observer 1 at time t is based on the entire sequence 
of observations till that time, in other words, observer 1 has 
perfect recall. Thus, we have that 



(1) 



where is observer I's message at time t to observer 2 and 
7j is the decision-function used by Ol at time t. Z} belongs 
to the set {0, 1,5}, where we use b for blank message, that 
is, no transmission. The sequence of functions j},t = 1,2,..., 
constitute the policy of observer 1. Let be the stopping time 
when observer sends a final message to observer 2, that is. 



i{t : Z\ e {0, 1}} 



(2) 



We allow two possibilities for the operation of observer 2. 
Case A: In this case, 02 first waits for Ol to send a final 
message. After receiving observer I's final message, observer 
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2 can decide either to stop and declare a decision on the 
hypothesis or take additional measurements on its own. After 
observer 2 has made k measurements (k = 1,2,...), it can 
decide to stop and declare a final decision on the hypothesis or 
take a new measurement. Each time observer 2 decides to take 
another measurement it incurs a cost c^. Whenever observer 2 
makes a final decision U £ {0, 1} on the hypothesis, it incurs 
a cost J{U,H). As in the case of observer 1, we assume ob- 
server 2 has perfect recall. Let E {0,1, N} be the decision 
made by observer 2 after receiving messages (Z^.^i) from 
observer 1 and subsequently making k observations (Y^.f.) of 
its own, (where we use N for a null decision, that is, a decision 
to continue taking measurements). Thus, 

U^^lUYlk,Zl^^), (3) 

for fc = 0, 1, 2, . . .. The message sequence Z^.^i is a sequence 
of — 1 blank messages followed by Z^i = or 1. The 
sequence of decision-functions 7^, A: — 0,1,2,... constitute 
the policy of observer 2. We define to be the number 
of measurements taken before observer 2 announces its final 
decision on the hypothesis, that is, 

= min{k : C/| G {0, 1}} (4) 

Case B: In this case, 02 starts taking measurements at time 
t = 1 without waiting for 01 to send a final message. At time 
t = 1,2,..., we have the following time-ordering of the two 
observers' observations and decisions: 

t HI 
X )(— X )( — ► 

Fig. 2. Time ordering in P2 

Thus, observer 2's decision at time t can be described as: 

U^^^UYluZlt) (5) 

where Uf <E {0,1, A^}. This decision is a function of the 
observations made at 02 {Yi.f) and the messages received from 
01 (Zl.f.). The message sequence j could be a sequence of t 
blank messages received from 01 or k blanks (fc < t) followed 
by a or 1. Let be the stopping time when observer 2 
announces its final decision on the hypothesis, that is, 

= min{t : G {0, 1}} (6) 

Note that we allow 02 to declare a final decision without 
getting the final message from Ol. Also, Ol does not know 
whether 02 has stopped or not, that is, there is no feedback 
from 02 to 01. As in Case A, a penalty of is incurred every 
time 02 decides to postpone its final decision and a terminal 
cost of J{U,H) is incurred when 02 makes its final decision 

c/e{o,i}. 

In both the cases above, we assume that the cost parameters 
c^, are finite positive numbers and J{U, H) is non-negative 
and bounded by a finite constant L for all U and H. Moreover, 
we assume that cost of an error in the final decision is more 



than cost of a correct decision, that is, J(0, 1) > J(l, 1) and 
J(1,0) > J(0,0). We can now formulate an optimization 
problem for each of the two cases above. 

1} Problem PI: We consider a finite horizon for ob- 
server 1 . That is, if the observer 1 has not sent its final message 
till time t = — 1, it must do so at time . In other 
words, we require that < . Similarly for observer 2 
described in Case A above, we require that it can at most 
take measurements before declaring its final decision, that 
is, < T^. The optimization problem is to select policies 
ri = (71,7!,.. .,7^1) and r2 = (7o,7?,7l,-,7|2) to 
minimize 

Er^r^{ciri + cV + J(C/2.,i/)} (7) 

where r^, and U^, fc = 0, 1, . . . are defined by equations 
(|2]i, ^ and @ above. 

2) Problem P2: As in Problem PI, we have a finite horizon 
for 01, that is, < and a finite horizon T'^{> T^) for 
02. 02's operation is as described in Case B above. The op- 
timization problem is to select policies — (7i7 72' •■■jTti) 
and = (71,7!, ...,7^2) to minimize 

Er''r'{cVi + cV2 + J([/2^,i/)} (8) 

where T^,r^ and U^, t — 1,2, .. . are defined by equations 
(|2]i, Q and ^ above. 

B. Features of the Problem 

In both the problems formulated above, the two observers 
share a common system objective given by equations (|7]i or 
(|8]l. The two observers, however, make decisions based on 
different information. Thus, Problems PI and P2 are team 
problems. Moreover, since the actions of observer 1 influence 
the information available to observer 2, these are dynamic team 
problems [13]. Dynamic team problems are known to be hard 
as they usually involve non-convex functional optimization 
over the space of policies of the decision-makers. Finding 
structural results for these problems is an important step 
toward reducing the complexity of these problems. In the next 
two sections, we will establish qualitative properties of the 
optimal policies of the two observers. 

III. Qualitative Properties of Optimal Policies for 
Observer Ol 

A. Information state for 01 

Consider Problem P2 first. We first derive an information 
state for Ol. For that purpose, we define: 

^]{Yl,):^P{H^Q\Yl,) (9) 

The probability irl is observer I's belief on the hypothesis 
based on its sequence of observations till time t. (For t = 0, 
we have ttq po)- The following result provides a character- 
ization of 01 's optimal policy. 

Theorem 1: For Problem P2, with an arbitrary but fixed 
policy of 02, there is an optimal policy for Ol of the 
form: 

Zl^l]{^l) (10) 
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for t = 1,2, ...,T^. In particular, if globally optimal policies 
pi,* p2,* gjj^jgj^ (jjgjj pi,* (.^jj assumed to be of the from 

in (10 1 without loss of optimality. Moreover, for a fixed F^, 



the optimal policy of Ol can be determined by selecting 
the minimizing option at each step of the following dynamic 
program: 



Vti (tt) min{ 



(11) 



= 0], 

J{U^2 , H)\n}pi = TT, ^iVi-l = ^l:Ti-l, 

Zt^ = 1]} 
and for fc = (T^ - 1),...,2,1, 

Vfc(7r) min{ 

Er'[cV2 + J(;7,2.,i7)Ki = 7r,Zifc_i = = 0], 

Er'[cV2 + J{U^2,H)\4 = 7r,Zl,_, = h..k-i,Zl - 1], 



(12) 



where the superscript in the expectation denotes that the 
expectation is defined for a fixed choice of T^. {Z\.f^ = &i:fe 
denotes a sequence of k blank messages.) 
Proof: See Appendix A. 

■ 

The result of Theorem 1 can be intuitively explained as 
follows. At any time t, if the observer 1 has not already 
sent its final message, it has to choose between three choices 
of action - send 0, 1 or 6. In order to evaluate the expected 
cost of sending a or 1, 01 needs a belief on the state of 
the environment, that is, a belief on H and a belief on the 
information available to 02. Since Ol has not yet sent a final 
message, the information at 02 consists of Z\.^_^ = fei:t-i, 
the decision of Ol at time t {Z}) and the observations that 
02 has made or may make in the future. Thus Ol needs to 
form a belief on Y^.j,2, since the rest of 02's information it 
already knows. Now because of conditional independence of 
observations at the two observers, it is sufficient to form a 
belief on H to know the probabilities of Y^.rj,2- Similarly, to 
evaluate the cost of sending a 6, Ol needs to form a belief 
on 02's information and what information 01 may obtain 
by future measurements - Y^^^.rpi. Once again, conditional 
independence of the observations made at different times given 
H implies a belief on H is sufficient to evaluate the cost of 
this action as well. These arguments indicate that the decisions 
at Ol should be made based only on its belief on H, that is. 

Corollary: Theorem 1 holds for Ol in Problem PI also. 
Proof: This result can be obtained by following the steps in 
Appendix A without any modifications. An intuitive explana- 
tion of this result is as follows: In the proof of Theorem 1, 
we fixed to any arbitrary choice. In particular, consider 
any policy of 02 that waits till it gets a final decision from 
Ol. After it receives the final message from 01 at time t^, 
it uses only observations made after to make a decision. 
This class of policies is essentially the policies available to 
02 in problem PI. Since the optimal structure of Ol's policy 



as given in ( lOi holds for any choice of F^, it also holds for 
all possible policies of 02 in problem PI. ■ 

B. Classical Two-Threshold Rules Are Not Optimal 

In the sequential detection problem with a single observer 
[12], it is well known that an optimal policy is a function of 
the observer's belief iTt and is described by two thresholds at 
each time t. That is the decision at time t, Zt is given as: 



if TTt < at 



= < N \f at<iit< I3t 
if TTi > I3t 







where N denotes a decision to continue taking measurement 
and at < (3t are real numbers in [0,1]. A similar two- 
threshold structure of optimal policies was also established 
for the decentralized Wald problem in [9]. We will show by 
means of two counterexamples that such a structure is not 
necessarily optimal for observer 1 in Problem PI. Example 
2 is similar to an example demonstrating the sub-optimality 
of threshold rules in a more general decentralized sequential 
detection problem that appeared in [14]. 
Example 1 

Consider the following instance of Problem PI. We have 
equal prior on H, that is P{H = 0) = P{H = 1) = 1/2. Ol 
has a time horizon of — 2. Its observation space is = 
{1,2,3}. The observations at time t — 1 have the following 
conditional probabilities: 



Observation 
Pi.\H = 0) 
P{.\H=l) 



1 2 3 

p {\-p) 
{l-p) p 



and at time t ~ 2 have the following conditional probabilities: 



Observation 
P{.\H = 0) 
P{.\H=1) 



1 


(1-9) 



2 3 

q a~q) 
q 



where p,q e [0, 1]. Observe that Ol's belief on {H = 0} (that 
is, TT^), only takes 3 possible values - 0, 1 and 1/2 after any 
number of measurements. Ol has to send a final message - 
or 1 - to 02 by time — 2. If Ol delays sending its final 
message to time < = 2, an additional cost c} is incurred. After 
receiving a message from 01, observer 2 can either declare a 
decision on the hypothesis or take at most 1 more measurement 
of its own, that is, we have = 1. The measurements of 02 
are assumed to be noiseless, so when 02 takes a measurement 
it knows exactly the value of H. However, the measurement 
comes at a cost of c^. We assume that J{U, H) = if [/ — H, 
and in the case of a mistake {U ^ H), we assume that the 
cost is sufficiently high so that unless 02 is certain from Ol's 
messages what the true hypothesis is, it will prefer taking a 
measurement at a cost c^ rather than making a guess. Ai p — 
0.6, c? > 3c^, it can be easily verified that the best threshold 
rule for observer 1 is described as follows: 



zl 



1 if = 

b iin\^l/2 
if Ti\ = 1 
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and 

i_ / 1 if7ri=0 
^2 - \ if ttI > 

If observer 2 receives or 1 at time t = 1, it declares 
the received message as the final decision on the hypothesis, 
otherwise it waits for a final message from Ol. At i = 2, if 
02 receives 1, it declares 1 as the final decision, otherwise it 
takes a measurement. Then the expected cost of this pohcy is 
given as: pc^ +p{l + q)c'^ /2 (since the system incurs a cost 
with probability p and a cost with probability p/2 + pq/2). 

Now consider the following non-threshold policy for ob- 
server 1, 

r 1 if TT^i = 

Zl= I ifirl = 1/2 
[ b if tt} = 1 

and 

i_ r 1 if ^1 = 

^2 - \ if tt} > 

UnUke a classical two-threshold rule, the above rule requires 
Ol to send a blank symbol at time 1 even though Oils certain 
that true if is 0. If observer 2 receives at time t = 1, it 
takes a measurement and incurs a cost c^. If 02 receives a 
1 at t = 1, it declares 1 as the final decision. If 02 receives 
a 6 at time t = 1, it waits for the final message at t = 2 
and then declares the received message as its final decision on 
the hypothesis. Then the expected cost of this policy is given 
as: pc^ + {1 — p)c^ /2 (since the system incurs a cost with 
probability p and a cost with probabiUty (1 — p)/2). It is 
now easily seen that at p = 0.6 and c? > 3c^, if we choose 
q > 1 — 1^, the non- threshold policy outperforms the best 
threshold policy. 

Discussion of the Example: The principle behind a threshold 
rule is to stop and send a message if Ol is certain, otherwise 
postpone the decision and take another measurement. The 
additional cost of delay is justified by the likelihood of getting 
a good measurement in the next time instant. In our example, 
if Ol gets the observations 1 or 3 at t = 1 and is able to convey 
to 02 that it is certain about the true hypothesis and what this 
hypothesis is, then it prevents 02 from taking a measmement 
thus saving a cost (?. The threshold rule achieves this objective 
by sending for observation 3 and 1 for observation 1. 
However, in the case when Ol gets measurement 2, it decides 
to wait for the next observation. By choosing q sufficiently 
high, the likelihood of getting a good measurement at f = 2 
can be made very low. In this case, the cost of delaying a 
decision (c^) begins outweighing the expected payoff from a 
new measurement. The non-threshold rule essentially tries to 
correct this drawback. If at time t= 1, Ol gets measmement 
2, it stops and sends to 02. At 02, this is interpreted as 
a message to go and take measurement of its own. Note that 
the non-threshold rule still ensures that whenever Ol is certain 
about H, it is able to send enough information to 02 to prevent 
it from taking a measurement. 

Example 2 

Consider the same problem as in Example 1 but with Ol's 
observations at t = 1 now given by the following conditional 



probabilities. 

Observation 12 3 4 
P(.|if = 0) p/3 2p/3 {l-p) 
P{.\H=\) H-p) 2p/3 p/3 

Ol's observations at time t = 2 are just noise and give no new 
information. The rest of the model is same as in Example 1. 
Note that the observations are indexed in order of the posterior 
beUef TT^ they generate, that is, P{H = Q\ObservaUonl) < 
P{H = 0\Observation2) and so on. If Ol postpones its final 
message to time t = 2, it has to pay an additional cost of 
c^. Observer 2 can make a noiseless measurement at a cost 
of c^. As in Example 1, Observer 2's cost of making a wrong 
decision is chosen sufficiently high so that unless it is certain 
from Ol's message what the true hypothesis is, 02 will prefer 
taking a measurement at a cost than making a guess. It 
can be shown that for equal prior (po = 1/2), > 2c^ and 
l/2<p<l,a non-threshold rule for Ol (given below) 
performs better than any threshold policy. 

• At t = 1, send if observation 2 occurs and 1 if 
observation 3 occurs. Send a blank otherwise. 

• At t = 2, send 1 if is less than 1/2 and otherwise. 
The corresponding policy for 02 is as follows: 

• At t = 1, if a or 1 is received, take a measurement, 
otherwise wait till t = 2. 

• At t = 2, declare the receive symbol as the final decision. 
The cost of the above choice of policies is: pc^ + {I — p)c^ . 

C. Parametric Characterization of Optimal Policies 

An important advantage of the threshold rules in the case 
of the centralized or the decentrahzed Wald problem is that it 
modifies the problem of finding the globally optimal poUcies 
from a sequential functional optimization problem to a sequen- 
tial parametric optimization problem. Even though we have 
estabUshed that a classical two-threshold rule does not hold 
for our problem, it is still possible to get a finite parametric 
characterization of an optimal policy for observer 1. Such a 
parametric characterization provides significant computational 
advantage in finding optimal policies, for example by reducing 
the search space for an optimal policy. 

In Theorem 1, we have established that for an arbitrarily 
fixed choice of 02's policy, the optimal policy for Ol can 
be determined by backward induction using the functions 
Vk{-K),k = T\...,2, 1. We will call Vk the value function 
at time k. We have the following lemma. 

Lemma 1: In problem PI or P2, with a fixed (but arbitrary) 
choice of F^, the value function at can be expressed as: 

FtiW := mm{L^i(7r),L^i(7r)} (13) 

where L° i (•) and L^i (•) are affine functions of tt that depend 
on the choice of 02's policy F^. Also, the value function at 
time k can be expressed as: 

Vkiir) ■.^min{Ll{Ti),Ll{TT),Gk{TT)} (14) 

where L^{-) and L],{-) are affine functions of tt and Gft(-) is a 
concave function of tt. The functions L\{-), and Gk{-) 
depend on the choice of F^. 
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Proof: See Appendix B. ■ 
Theorem 2: In Problem PI or P2, for any fixed policy of 
02, an optimal policy for Ol can be characterized by at most 
4 thresholds. In particular, without any loss in performance, 
one can assume Ol's policy to be of the following form: 

1 if TTyi < 
if TTyi > a^i 

where < < 1 and for fc = 1, 2, .., - 1, 

b if T^l < ak 

if ttfc <ttI< Pk 

if /3fc < TT^ < 4 

if 4 < 7^1 < Ok 



7I 



if 7ri > 



where Q < ak < (3k < 5k < Ok < 1. 

Proof: Theorem 2 is an immediate consequence of 
Lemma 1 , since taking minimum of straight lines and concave 
functions can partition the interval [0, 1] into at most five re- 
gions. The thresholds above essentially signify the boundaries 
of these regions. For a given F^, it is possible that at some 
time instant fc, the optimal policy for Ol partitions the belief 
interval [0, 1] as {6, 0, 5, 1, h] instead of {6, 1, fe, 0, 6}. In this 
case, it is easily seen that simply interchanging the roles of 
and 1 in OFs policy and in F^ at time k would result in the 
threshold structure of the theorem without loss of performance. 

■ 

It is of course possible that in specific cases, some of these 
five regions are absent which would correspond to some of the 
above thresholds having the same value. For example, in the 
non-threshold rule given in the Example 1 earlier, the rule at 
t — \ corresponds to having a = and j3 — 5 which results 
in a 3-interval partition of [0, 1] corresponding the rule given 
there. 

IV. Qualitative Properties for Observer 02 

A. Problem PI 

Consider a fixed policy F^ — (71 , 72 , for OL Then, 

after Ol sends its final message, we can define the following 
probability for 02: 

7r2:=pr^(i7 = 0|ZiVi) 

This is 02's belief on the true hypothesis after having observed 
the messages from Ol (that is a sequence of — 1 blanks 
and a final Z^i G {0,1}). Now, the optimization problem 
for 02 is the classical centralized Wald problem [12] with the 
prior probability given by ttq. It is well-known that the optimal 
policy for the Wald problem is a rule of the form: 



ul 



1 if 7r| < wl 

N \f wI<t:1< wl 

if > wl 



where tt^ is the belief on hypothesis after k observations, 

T^liXl^) ■.^p^\h = o\zI,^,y,\) 

^ P(n^Jg = 0).vrg 

P{YlJH = 0).^o' + PiY,\\H = 1).(1 - TT^) ' 

and wl < wl, for A: = 0, 1, 2, .., - 1 and w. 
the optimal thresholds for the Wald problem with horizon T'^ 



B. Information State in Problem P2 

Consider a fixed policy F^ = (7^, 7^, 7^1) for Ol. 
Define the following probability for 02: 

^l{Yl,,Zl,,) {H = Q\Yl„Zl,,) 

ttI is observer 2's belief on the hypothesis based on its 
observations till time t and the messages received from Ol 
till time t (where the messages from Ol could be all blanks 
or some blanks terminated by a or 1). For t = 0, we have 

T^o = Po- 

The following theorem shows that irf and Zl.^ together form 
an information state for 02. 

Theorem 3: In Problem P2, with an arbitrary but fixed 
policy F^ of 01, there is an optimal policy for 02 of the 
form: 

C/2 = 72(Zi„7r2) (15) 

for t — 1,2,...,T^. Moreover, this optimal policy can be 
determined by the following dynamic program: 



^2 



VT2izl.T,,Tr) ■=min{E^ [J{0,H)\Tr^ 

E^\j{l,H)\7T^,^7r]} (16) 

and for k = {T^ - 1),...,1, 

Vk{zl,k,'^) min{ 
E^'[J{0,H)\nl = 7T], 



E^'[J{l,H)\TTl^nl 

c' + E^\Vk+i{Zl,^„7rl+,) 
Proof: See Appendix C. 



nl^n^Zl.^zlJ} (17) 



Observe that in the last term of ( 17 1, which corresponds to 
the cost of postponing the final decision at time fc, we have tt^ 
as well as all messages from Ol in the conditioning variables. 
It is because of this term that we need the entire sequence of 
messages as a part of the information state. To intuitively see 
why these messages are needed in the conditioning, note that 
the cost of continuing depends on future messages from Ol. 
In order to form a belief on future messages, 02 needs a belief 
on the hypothesis and (since Ol has perfect recall) a belief on 
all observations of Ol so far. Clearly, the messages received 
till time k provide information about the observations of Ol 
till time k and are therefore included in the information state. 

We can now prove the following lemma about the value 
functions Vk- 

Lemma 2: The value function at can be expressed as: 



VT2{zl.rpi,TT) :— min{l^ (n) , l^ (n)} 



(18) 



rp2 



where and are affine functions of tt that are independent 
of the choice of Ol's policy F^. Also, the value function at 
time k can be expressed as: 

Vk{4:k,^) - rmn{l%7r)j\TT),Gkizl.k,TT)} (19) 

where, for each realization zj.j, of messages from Ol, Gk is 
a concave function of vr that depends on the choice of Ol's 
policy, F^. 

Proof: See Appendix D. ■ 
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Theorem 4: For a fixed policy of 01, an optimal policy 
of 02 is of the form: 

U2^^[ 1 if TT^^ < 

[ if TTya > OlT2 
( 1 if^^<^^(^l^) 

Ul^l N if akiZl,) < nl < PkiZl,) 

where < UkiZlf.) < /^^(Zj.j,) < 1 are thresholds that 
depend on sequence of messages received from Ol {Z\.^). 

Proof: At any time k, if tt^ = 0, then it is optimal to 
stop and declare the hypothesis to be 1 since cost of continuing 
will be at least + J(l, 1) which is more than J(l, 1) - the 
cost of immediately stopping and declaring J7| = 1. Similarly, 
at TT^ = 1, it is optimal to stop and declare C/| = 0. These 
observations along with the fact that the value functions Vk are 
minimum of affine and concave functions for each realization 
of the messages received imply the result of the theorem. ■ 
Thus, according to Theorem 4, the thresholds to be used at 
time k by 02 depend on the sequence of messages received 
from Ol until time k. This kind of parametric characterization 
may not appear very appealing since for each time k one may 
have to know a number of possible thresholds - one for each 
possible realization of messages z\.^,. We will now argue that 
there is in fact a simple representation of the thresholds. Note 
that after time t^, when Ol sends a final message, 02 is 
faced with a classical Wald problem with an available time- 
horizon of — T^. Now suppose that the classical Wald 
thresholds are available for a time horizon of length - 
lets call these (lUg, lUg), {w\, wf), {w2, Wy2. Then the 

Wald thresholds for a problem with time horizon — are 

simply {wl,,W^^),{wl^^^,wl^^^),{wl^^2,wl^_^_2),...,WT2. 

Thus, once 02 hears a final message from Ol, it starts using 
the classical Wald thresholds from that time onwards. In other 
words, 02 operation is described by the following simple 
algorithm: 

• From time k = 1 onwards, the optimal policy is to use a 
threshold rule given by 2 numbers ak{bi:k) and Pk{bi:k), 
until 01 sends its final message Z\, € {0, 1}. (As before, 
h\;k stands for sequence of k blank messages.) 

• If Ol sends the final message at time k, start using Wald 
thresholds: {wl,wl), ...,wt2. 

Thus 02's optimal poUcy is completely characterized 
by just two tables of thresholds: [(ai(foi:i), 

(a2(&i:2),/32(fei:2)), (aTi(&i:Ti)>/3Ti(^i:Ti))] ^nd the 
Wald thresholds [(wq, Wq), (wj, wj), (^2, w^), w^a] . 

V. Optimal Policies 

In the previous sections, we identified qualitative properties 
of the optimal policies for the two observers. Moreover, if 
the policy F^ (F^) of 02 (Ol) has been chosen already. 
Theorems 1 (Theorem 3) provides a dynamic programming 
solution to find an optimal policy F^ of Ol ((F^) of 02) for 
the given choice of F^ (r^). An iterative application of such 
an approach may be used to identify person-by-person optimal 
pair of strategies. However, finding globally optimal or near 
optimal strategies for such dynamic team problems remains 



a challenging task since it involves non-convex functional 
optimization [13]. In this section, we will give a sequential 
decomposition of the global optimization problem. Such a 
decomposition provides a systematic methodology to find 
globally optimal or near-optimal policies for the two observers. 

A. Sequential Decomposition for Problem PI 

In Problem PI, observer 2 waits to receive a final message 
from observer 1 before it starts taking its measurements. After 
receiving the final message, observer 2 is faced with the 
centralized sequential detection problem studied by Wald. For 
the Wald problem, the thresholds characterizing the optimal 
policy and the cost of the optimal policy are known. For a 
Wald problem with horizon T and a starting belief tt on the 
event {H — 0}, the cost of using the optimal Wald thresholds 
is a function of the belief tt which we will denote by {tt). 
Since the Wald thresholds for observer 2 are known (or can be 
calculated as in [12]), the designer's task in problem PI is to 
find the best set of thresholds to be used by observer 1 . Finding 
the best thresholds for all times i = 1 to is a formidable 
optimization problem. Firstly, the system objective (equation 
(|7]i) is a complicated function of the thresholds selected for 
observer 1 . Moreover, the objective must be optimized over the 
space of sequences of thresholds to be used from time t = 1 
to r^. Below, we show that the optimization problem can in 
principle be solved in a sequential manner. In the resulting 
sequential decomposition, at each step the optimization is 
over the set of thresholds to be used at a single time instant 
instead of the space of sequences of thresholds from time 1 to 
T^. Though the original optimization problem is decomposed 
into several "simpler" optimization problems, each of these 
remain difficult nonetheless. We believe that the decomposed 
problems may be more amenable to approximation techniques. 

We first define the following: 

Definition 1: For t = l,2,...,r^ and a given choice of 
observer I's decision functions from time instant 1 to < — 1, 
that is, (Fi_i = {llnl. ••■,7t-i)X define 

6 P^"-^{H,^]\Zl,,_^^W.,t^,) 

For t = 1, 2, and for a given choice of functions (F^ — 

{l\ ill. -ill)), define 

-nA^l] -.^ P^HH,nl\Zl,_, = h..t-i,Zl - zl) 

where z} G {0, 1,6}. 

Lemma 3: Consider any policy t = 1,2, ...,T^ for ob- 
server 1 that is characterized by 4 thresholds {al, Pl,Sl,9l), 
for t = 1,2, ...,T^ - 1 and a threshold a^^i at time 
(Theorem 2). Then, 

i) There exist transformations Ql for t = l,2,...,r^ such 
that 

ritizU^Qli^tillizl) 

for G {0, 1, &}, and 

ii) There exist transformations Q^, t = 1,2, ...,T^ — 1 such 
that 

6+i = Q?(?7tW) 
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Proof: We first prove the second part of the lemma. By 
definition, 

6+i(/i,7ri) ^ pr;(^ ^ j^^^i^^ ^ ^ ^^^^ 

- prJ(H = h,Tt{TTlY^\^) - TT^lZi, - 5i.O (20) 

where we used the fact that Ol's behef at time t + 1 is a 
function of its belief at time t and the observation at time 
t + 1, that is, nlj^i = Tt{'Kl, FjYi) (^^^ Appendix A, equation 
(|48]l). The right hand side of ([20| can further be written as: 



Jy,-K' 

= I [lT,i.'.y)=.^-P{Y^+l^y\H^h) 

.P^HH^h,7rl = n'\Zlt^h.,t)] 
- / [lT,i.',y)=.i.P{Y,\i - y\H - /j).,7t[&](/i,7r')] (21) 

Jy,TT' 

The above integral is a function of yyt [6] and known observation 
statistics. Thus ^t+i = Qtirjtib]), where is given by the 
expression in ( [2T| . 

For the first part of the lemma, consider 



P^l{Zl = h\Zl,_,^h^.,,^,) 



(22) 



Under the 4-threshold rule for observer 1, Zl = 6 if tTj e Ct, 
where C* := [0,aj) U (/3t\Jji) U {6], 1]. Therefore, the above 
probability can be written as: 

1^.^c,-P^Hh = = 7r^\Zlt_, = bi..t-i) 



p^HttI eCt\zl^_, = b,.,t_,) 



(23) 



The above equation is a function of and the thresholds 
selected by j}. Similar analysis holds for 774 [0] and %[!]. This 
concludes the proof of the lemma. ■ 

We can now present a sequential decomposition of problem 
PI. 

Theorem 5: For t ~ 1,2, ...,T^ — 1, there exist functions 
7^t (6 , ai , Pi , 5] , e] ) and Tit ) where 

7^:(6)= inf 7^i(6,a^A^'5^et^) 

and for t = T^, there exist functions T^-j-i (^j-i , a^i)and 
7^^1(^7-1) where 

such that the optimal thresholds can be evaluated from these 
functions as follows: 

1) Note that ^1 := P{H,-k\) is fixed a priori and does not 
depend on any design choice. The optimal thresholds at 
t — \ for 01 are given by optimizing parameters in the 



definition of 7^^(Cl). 

2) Once Ol's thresholds at t = 1 are fixed, 771 [6] and hence 
^2 are fixed by lemma 3. The optimal thresholds for Ol 
at time t = 2 are given by optimizing parameters in the 
definition of 7^2(6) 

3) Continuing sequentially, is fixed by the choice of past 
thresholds, and the optimal thresholds for Ol at time t 
are given by optimizing parameters in the definition of 

Proof: We will prove the result by backward induction. 
Consider first the final horizon for Ol: T^. Assume that a 
designer has already specified functions 7;[,7|,...,7^i for 
01. The designer has to select a function to be used by Ol 
at time in case the final message has not been already 
sent (that is, Z}pi_^ = 6i:T1-i)- By Theorem 2, this function 
is characterized by a single threshold ai^i. For any choice 
of a^i, the future cost for the designer is K'^ (ttq), where 

(•) is the cost of using optimal Wald thresholds with a 
time-horizon and ttq is 02's belief on {H — 0} after 
receiving Z\.j,i . The expected future cost for the designer can 
therefore be expressed as: 

= W.{K'^\^Tl)\Zl.^^_^ = b^.,Tl-^} 

^K'^\p{H ^Q\Z}ri 0,ZiVi_i = 6i:Ti-i)) 
• Pi^Z^pi = Q\Z\.rpi_^ = bi.j^i_i) 

+ K^\P{H ^ OlZ^i = 1, ZlT^_^ = bi.,Ti-l)) 
■ P{Z}pi — l\Zl.rpi_-^ — 6l:Tl-l) 



• P{'K}pi > a^pi \Z\.rpi_-^ = bi-Ti-i) 
+ K^\P{H = OlZ^i = 1, ^1 

• P{tt}pi < a^pi \Z\.rpi_-^ = bi-Ti-i) 



■ &l:Tl-l)) 
&l:Tl-l)) 



(24) 



where we used the fact that the probabilities in the arguments 
of (•) are marginals of ryj-ip] ?yTi[l] respectively and the 
probabilities multiplying the functions are marginals of 
^7-1. Using Lemma 3, we can write (j24]| as 

Lyi {Q}pi (^^1 , 0, a^i ) , Q}pi (Cti : 1: "ti ) > ?ti , ) 

=: 7?.Ti (Cti , "ti ) 

Thus, for a fixed choice of functions 7J, 72 , 7^i_j^ used 
till time — 1, the designer's future cost at ,if the final 
message was not sent before T^, is a function of ^7-1 (that 
is induced by the choice of the past decision functions) and 
the threshold a^i it selects at time T^. To find the best 
choice of threshold, the designer has to select a^i to minimize 
TZt^ (^Ti I Oi}pi). Define 

n*j,i{^T^) = inf R{^T^,a^^^) 
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For a given ^2-1, the function TZ^i describes the optimal future 
cost for the designer and the optimizing a^i gives the best 
threshold. 

Now assume that T^t+ii^t+i) describes the designer's op- 
timal future cost from time t + 1. At time t, if the past 
decision functions 73 , have been specified already, 
the designer's task is to select thresholds al,f]},6},dl to be 
used by 01 at time t. For a given choice of these thresholds, 
the future cost for the designer is K'^'iir'^) if 01 sends a final 
message at If a blank message is sent at t, the designer will 
use the best threshold at the next time t + 1 and the future 
cost will be +TZ1^i{£,t+i)- The expected future cost for the 
designer is therefore given as: 

E{c\t^ -t) + cV^ + J{Ur2,H)\Zlt_^ = bi.,t-l} 

= K^\P{H = 0\Zl = 0, Zl,_, = 6i.,_i)) 

• P{Zl = 0\Zl,_, = h..t-i) 

+ K^\P{H = 0\Zl - 1, Zl,_, = h..t-i)) 

•P(Zi-l|ZiVi = ^^i:t-i) 

+ [c^ +7^*+l(6+l)] • P{Zl = b\Zl,_, = 

(25) 

= K^\P{H - Q\Zl ^ 0,Zl,_, = 
■ P{6l < nl < el\Zl,^, = h..t^,) 
K^\p{H = 0\Zl^l,Zl,_, = b)) 
■Pial<7rl <Pl\Zl,_,^b^..t^,) 
+ [c' +n:+,{QUvt[bm] ■ Pinl e Ct\Zl,^^ = b^..t-^) 

(26) 

Lt{T^mM^.ilt[hl^ua\,(3l5le\) (27) 

where we used the fact that the probabilities in the arguments 
of {■) are marginals of rjt\Q] 77* [1] respectively and the 
probabilities multiplying the functions and TZt+i are 



marginals of ^j. Using Lemma 3, we can write (27i as a 



function of (that is induced by the choice of past functions 
used till time t — 1) and the thresholds selected at time t: 



(28) 



To find the best choice of threshold, the designer has to select 

[a] , a] , 5] , e] ) to minimize Ut (6 , a] , d] , 5] ,6]). Define 



si, el 



(29) 



For a given ^t, the function TZf describes the optimal future 
cost for the designer and the optimizing thresholds are the best 
thresholds. The above analysis can be inductively repeated for 
all time instants. 

The optimal thresholds can therefore be evaluated as fol- 
lows: At < = 1, ^1 is fixed a priori, therefore one can use 
TZl to find the best thresholds at time t = 1. Once these are 
selected, ^2 can be found using Lemma 3 and one can use 
to find the best thresholds at time t = 2 and so on. 

■ 

Discussion: The problem of choosing the optimal thresholds 
for observer 1 can be viewed as a sequential problem for 



the designer as follows: At each time t, the designer must 
specify the thresholds to be used by observer 1 in case the 
final message has not already been sent. In other words, at 
each time t, one can think that the designer is aware of the 
messages sent from Ol to 02 until t~l and in case these were 
only blanks, the designer must choose the thresholds to be used 
by 01 at time t. Thus, the designer is faced with a sequential 
optimization problem with a fixed temporal ordering of its 
decisions. Observe also that the designer has perfect recall: it 
knows all messages sent till time t. The designer, therefore, 
has a sequential problem with a classical information structure 
[15]. The proof of Theorem 5 essentially describes the dy- 
namic program for the designer's problem. The behef serves 
as the designer's information state and the functions i?^ (Ct) are 
essentially the value functions of the dynamic program. This 
approach of introducing a designer with access to the common 
information between observers (that is, the information known 
to both observers: the messages from Ol to 02 in Problem PI) 
so as to convert a decentralized problem to one with classical 
information structure is illustrated and fully explained in [16, 
Section IV] for a communication problem. We refer the reader 
to that paper for a detailed exposition of this approach. 

In Problem PI, until the time r^, the information available 
to 02 consists only of the messages sent from Ol. This 
is the same information that the designer uses to select 
the thresholds. Thus 02 can be thought of as playing the 
role of the designer in the proof of Theorem 5. The fact 
that the problem of choosing the thresholds can be viewed 
from 02's perspective is crucial in determining the nature 
of the information state for this problem. The form of our 
information state and the approach of viewing the problem 
from 02's perspective imitates the information state and the 
philosophy adopted in [17] for a real-time point-to-point 
communication problem with noiseless feedback, where the 
problem of choosing the encoding functions can be viewed 
from the decoder's perspective. 

B. Sequential Decomposition for Problem P2 

In this section, we present a sequential decomposition 
similar to Theorem 5 for Problem P2. In Problem P2, both 
observers start taking measurements at time t = \. Moreover, 
02 is allowed to stop before receiving the final message from 
01 (see the time-ordering in Fig. 2 for t=l,2,...). In Problem P2, 
the messages sent from 01 to 02 are still common information 
among the two observers. The problem of choosing the optimal 
thresholds for the two observers can still be viewed as a 
sequential problem from the perspective of a designer who at 
any time t knows the common information. At each time t, the 
designer must specify the thresholds to be used by observer 1 
in case the final message has not akeady been sent. It also has 
to specify -for each realization of messages from Ol- the set 
of thresholds to be used at 02. In other words, at each time 
t, one can think that the designer knows the messages sent 
from 01 to 02 and the designer must choose the thresholds 
to be used by Ol and 02 at time t. The designer's problem 
can therefore be viewed as a sequential optimization problem 
with classical information structure. 
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Unlike Problem PI, 02's information no longer coincides 
with the designer's information of all previous messages from 
Ol, since 02 has its own observations as well. The fact 
that the designer's problem can no longer be viewed from 
02 's perspective implies that the information state found for 
Problem PI is no longer works for this problem. The main 
challenge now is to find a suitable information state sufficient 
for performance evaluation for the designer's problem. We 
present such an information state and the resulting dynamic 
program below. 

As mentioned earlier, once observer 1 has sent its final 
message to observer 2, the optimization problem for observer 
2 becomes the well known centralized sequential detection 
problem studied by Wald. The thresholds characterizing the 
optimal policy and the cost of the optimal policy are known. 
For a Wald problem with horizon T and a starting belief tt 
on the event {H = 0}, the cost of using the optimal Wald 
thresholds is a function of the belief tt which we denote 
by K^{tt). The designer's task is to select the sequence of 
thresholds to be used by observer 1 and the sequence of 
thresholds to be used by observer 2 until the final message has 
been sent from 01 to 02. After Ol's final message has been 
sent, 02's thresholds are known to be the Wald thresholds with 
appropriate time-horizon. We will now present a sequential 
decomposition for the designer. 

Recall that we defined observer 2's belief on H as follows: 

Til evolves in time as 02 gets more measurements and 
messages. Once 02 has announced its final decision, its belief 
on H does not change with time (since 02 is no longer making 
measurements or listening to messages from Ol). We begin 
with the following definition and lemma. 

Definition 2: For t — l,2,...,r^ and a given choice of 
observer 1 and observer 2's strategies from time instant 1 
to i - 1, (that is, ri_i = (7i,7i,...,7i_i) and V}_^ = 
(7?,7l,-,7t-i)X define 

Dt := lr2>t 

where is the stopping time of 02 as defined in (j6]l. For 
i=l,2,...,T^ — 1 and for a given choice of strategies {V\ — 
and (rti - (7?,7l,-,7?-i)X define 

:= P^"-^'-^{H,T:l^lDt\Zl,,_, = b,..t-i,Z} = z}) 

where z} e {0, 1,5}. 

Lemma 4: Consider any policy ^l, t = 1,2, ...,T^ for ob- 
server 1 that is characterized by 4 thresholds {al,Pf,Sl,9l), 
for t = 1,2, ...jT^ - 1 and a threshold a^i at time 
(Theorem 2), and a policy jf, t — 1,2,...,T^ which is 
characterized by thresholds (a^,/?^), t — 1,2,. ..,T^ — 1 to 
be used if Ol has not sent a final message and the Wald 
thresholds {w],w'^),t = 1,2, ....,r^ to be used if the final 
message from Ol has been received. Then, we have: 



i) There exist transformations Q\ for t — 1,2,...,T^ such 
that 

(j>t[zl] = Q\{ijullzl) 

for z} e {0, 1,6}, and 

ii) There exist transformations Q^, t= 1,2, ...,T^ — 1 such 
that 

Proof: See Appendix E. ■ 
We can now present a sequential decomposition of problem 
P2. 

Theorem 6: For t — \,2,...,T^ — 1, there exist functions 

Tt{',Pt,al,p],5],e]) and T;{i^t) and a^, /jZ) ^nd 

g*t{<pt[b]) where 

Tlii^t)^ ^ inf Tt{^ua\,(3l,5l,el) 

Q;{m)= inf t/t(0t[6],a?,A') 

and for t = T^, there exist functions !F{tpT^,ct}^i) and 
JF^i(i/'ri) where 

ct^ 1 

such that the optimal thresholds can be evaluated from these 
functions as follows: 

1) Note that tjji is fixed a priori and does not depend on 
any design choice. The optimal thresholds at t = 1 for 
01 are given by optimizing parameters in the definition 

of,Fr(^i). 

2) Once Ol's thresholds are fixed, (j>i[b] is fixed by Lemma 
4. The optimizing thresholds to be used by 02 if a 
blank message was received are given by optimizing 
parameters in the definition of Ql{(j)i[b]). In case a or 
1 was receiver from Ol, the optimal thresholds for 02 
from this time onwards are the Wald thresholds for a 
finite horizon — 1. 

3) Continuing sequentially, V'* is fixed by the choice of past 
thresholds and the optimal thresholds for Ol at time t 
are given by optimizing parameters in the definition of 
!F^{tpt)- Once Ol's thresholds are fixed, 4>t[b] is fixed by 
lemma 4. The optimizing thresholds to be used by 02 if 
a blank message was received are given by optimizing 
parameters in the definition of Of{ipt[b])- In case a or 
1 was receiver from Ol, the optimal thresholds for 02 
from this time onwards are the Wald thresholds for a 
finite horizon — t. 

Proof: See Appendix F. ■ 
As in Theorem 5, the sequential decomposition in Theorem 
6 is a dynamic programing result for the designer's sequential 
problem of choosing the thresholds for 01 and 02. At time 
t, ipt is the designer's information state just before selecting 
the four thresholds to be used at Ol to decide its message Z}, 
whereas (j>t is designer's information state just before selecting 
the thresholds to be used by 02 to decide (See Fig. 2). 
The actual form of the functions !Ft and Gt is obtained by 
backward induction in Appendix F. 
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VI. Infinite Horizon Problem 

In this section we analyze infinite horizon analogues of 
problems PI and P2. We first focus on Problem P2. 

A. Problem P2 with Infinite Horizon 

Consider the model of Problem P2 as described in Section 
im We remove the restriction on the boundedness of the 
stopping times, that is, and need not be bounded. The 
optimization problem is to select policies = (7ii72i---) 
and = (7i:72i ■•■) minimize 



E 



2 2 
- C T 



(30) 



where r^, t'^ and U'^ are defined by equations (j2]i, ^ and (j6|l. 
We assume that the cost parameters c^, are finite positive 
numbers and that J{U^,H) is non-negative and bounded by 
a constant L for all and H. 

Remark: We can restrict attention to policies for which E{t^} 
and E{t^} are finite, since otherwise the expected cost would 
be infinite. Thus, we have that and are almost surely 
finite. However, the stopping times may not necessarily be 
bounded even under optimal policies. 

1) Qualitative Properties for Observer 2: Consider any 
fixed policy for Observer 1. We will provide structural 
results on optimal policies for Observer 2 that hold for any 
choice of r^. Consider the case when observer 2 has not 
stopped before time t. Consider a realization of the information 
available to 02 at time t - y^.f, z\.^ and let 7f| = (H = 
0|t/2 (7 ^i t) be the realization of 02's belief on H. Let 
be the set of aU policies available to 02 at time t after having 
observed yf.^,zl.f., and let be the subset of policies in 
_4°° for which the stopping time is less than or equal to 
a finite horizon T^,{t < < oo). Then, from the analysis 
for the finite-horizon problem P2, we know that there exist 
value-functions V^'" {zI.^,tt^) such that 

= M E^\c'T' + c\^ + J{U%,H)\yl,,zlt] (31) 

r2g_4T2 

This value-function is the optimal finite horizon cost for 

observer 2 with horizon T^. 

We define the following function: 

= inf E^\c'T'+c'T' + J{U^.,H)\ylt,zl,] (32) 

Lemma 5: i) The value functions V^'^'^ {z\.^,t:^) are non- 
increasing in and bounded below by 0, hence the limit 

limT2_^ooV;^ i.Au'^t) exists, 
ii) Moreover, 

Proof: See Appendix G. ■ 
We can now prove the following theorem: 

Theorem 7: For a fixed policy for 01, an optimal policy 
for 02 is of the form: 

1 if TT^^<at{Zl,^) 
U^^{ N if atiZl,) < nj < PtiZl,) 
if nj > (3t{Zl,) 



where < at{Zl.f) < j3t{Z\.^ < 1 are thresholds that depend 
on the sequence of messages received from Ol (Zl.f.). 

Proof: Consider a realization yf.^, z\.^ of 02's obser- 
vations and messages from Ol. Let tt^ be the realization 
of 02's belief, where vf^ = P^\h = {)\z\.^,yl.^). Since 
VrizluVlt) = limT2^ooV;^'(^L,7f2), it follows that 
is a function only of z\.^ and ttj . Since 02 at time t has only 
3 possible choices, we must have: 

i?i^'[J(0,ff)|7f2], 
E^\j{l,H)\i:% 



E"^" [V;^i(Zi ,^.1, 7r2+i)|7f2, z\,,]} (33) 



From Lemma 2, we know that the first two terms are affine 
in 7f^. From Lemma 5, we know that is the limit of a 
sequence of finite-horizon value functions. Now, for a fixed 



-l:t+l- 



the finite horizon value functions are concave in tt? 



t+i 



(from Lemma 2), therefore, for a fixed zl.^^i, the limit Vf°° 



t+i 



is concave in irf^i as well. Using the concavity of Vf^i and 
following the arguments in the proof of Lemma 2, we can 
show that the third term in equation (33 i is concave in itf for 



a fixed zl.f.. Thus, for a given reaUzation of z^.^, the infinite 
horizon value function is minimum of two affine and one 
concave function. Moreover, it is optimal for 02 to stop if 
7f^ = or 1. Therefore, the optimal policy for 02 must be of 
the form: 

r 1 if^2<„^(^ij 

U^^l N if at{Zl,) < nj < PtiZl,) 
[ if irj > (3t{Zl,) 

m 

As in the finite horizon problem, once observer 1 has sent 
the final message to observer 2, observer 2 is faced with the 
classical centralized Wald problem. With an infinite horizon, 
the optimal Wald policies are characterized by stationary 
thresholds (say, (w^,w^)) that do not change with time [12]. 
Thus, in the infinite horizon version of Problem P2, observer 
2's operation can be described by the following algorithm: 

• From time k ^ 1 onwards, the optimal policy is to use a 
threshold rule given by 2 numbers ak{bi:k) and l3k{bi:k), 
until 01 sends its final message Zj. G {0, 1}. 

• From the time Ol sends a final message, start using the 
stationary Wald thresholds (w^jW^). 

2) Qualitative Properties for Observer 1: Consider a fixed 
policy F^ for 02 which belongs to the set of finite horizon 
policies A^' with horizon T^. We will show that given such 
a policy for 02, Observer I's infinite horizon optimal policy 
is characterized by 4 thresholds on its posterior belief. We 
will employ arguments similar to those used in the previous 
section. 

Consider the case when observer 1 has not stopped before 
time t. Consider a realization of the information available to 
Ol at time t - ylf and let tt} = P{H = 0\yl;t) be the 
realization of Ol's belief on H. Let be the set of all 
policies available to 02 at time t after having observed yl.f., 
and let be the subset of policies in B°° for which the 
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Stopping time is less than or equal to a finite horizon 
T^,{t < < oo). Then, from the analysis for the finite- 
horizon problem P2, we know that there exist value-functions 
V^^ttI) such that 

= inf E^^c't' + cV^ + J{U%,H)\yi,] 

where tt^ = P{H = 0\yl,,). 
We define the following function: 



(34) 



inf 



[c't'+c^t^ + J{U^.,H) 



\yl.t] 



(35) 



Lemma 6: i) For a fixed finite-horizon policy of 02, 
the value functions Vj^ {tt}) for Ol are non-increasing 
in and bounded below by 0, hence the limit 
limTi_>oo (tTj ) exists, 
ii) Moreover, 

Vriyi,)=lna V,^\nl) 

1 ^ — ^oo 

Proof: See Appendix H. ■ 
We can now state the following theorem: 

Theorem 8: For a fixed finite-horizon policy for 02, an 
optimal policy for Ol is of the form: 

b 

if a] < tt] < f3t 



if A < TT 
if <5f < TTt 

if n} > 



< 9, 



where < at < l3t < 5t < 9t < 1. 

Proof: Because of the above Lemma, we conclude that 
^t°°{yi:t) depends only on the reahzation of the belief 
(7f( = P{H = 0|?/i.j)). It is, moreover, a concave function 
of nj. The result of the theorem follows by using arguments 
similar to those in the proof of Lemma 1 . ■ 
Theorem 9: There exist globally e-optimal policies G^^C^ 
for observers 1 and 2 respectively, such that, is character- 
ized by 4 time -varying thresholds. 

Proof: Consider any e/2-optimal pair of policies F^, F^. 
Then, by arguments used in Lemma 5, we know that there exist 
a finite horizon pohcy F^a such that the pair F^ , F^2 is at most 
e/2 worse than F^,F^. Since F|,2 is a finite horizon policy, 
by theorem 8, we conclude that Ol can use a 4-threshold rule 
without losing any performance with respect to the pohcies 
F^,F^2. Thus, we have an e optimal pair of policies where 
Ol's policy is characterized by 4 time- varying thresholds. ■ 



B. Problem PI with Infinite Horizon 

The above analysis for infinite horizon version of Problem 
P2 can be easily specialized to the case of Problem PI. In 
particular, observer 2's problem is now the classical Wald 
problem with infinite horizon; thus its optimal policy is charac- 
terized by two stationary thresholds. Moreover, the arguments 
of Lemma 6 and Theorems 8 and 9 can be repeated without 
any modification to obtain the same qualitative properties for 
observer 1 in Problem PI. 



VII. Communication with M-ary Alphabet 

Consider models of Problem PI or P2 with the following 
modification: when observer 1 chooses to stop taking mea- 
surements and send a message to observer 2, it can choose to 
send one of M possible choices from the set: {0, 1, M— 1}. 
Thus, observer I's message at time t to observer 2, which is 
a function of aU its observations. 



(36) 



belongs to the set {0, 1, M—1, b}, where we use b for blank 
message, that is, no transmission. The sequence of functions 
7(,t = 1,2,..., constitute the policy of observer 1. Let 
be the stopping time when observer sends a final message to 
observer 2, that is. 



= min{t : Z] e {0, 1, M - 1}} 



(37) 



Observer 2's operation and the overall system objective are the 
same as in problem PI or P2. Then, we have the following 
result: 

Theorem 10: In Problems PI or P2 where observer 1 can 
send one of M possible final messages, there is no loss of 
optimaUty in restricting attention to poUcies for observer 1 
that are of the form: 

.M-l 

.M-2 



7I 



M-l 
M-2 



• IT 1 ^ P 

It TT^l < Olrpl 

if a^i~^ < TT^i < a^i 



if a^i < 7r,],i < a^i 



if TT^i > a 



where < a 



1 ^ M 

< a, 



thresholds and for A: = 1, 2, .., - 1, 



j,i < ... < a^i < 1 are M - 1 



b 

M- 
b 

M 



• r 1 

11 TT^ < a 

if Qi. 



if «f < < 



if /3f -1 - -1 



if al 

if/3^ 
if al 



<7Ti 



< TTt < a 



< TT 



if nl > /30 





k 



where < a 



M-l 



M-l 



< a 



M-2 



< ... <al<f3l< 



Q!fe < /3fc < 1 are 2M thresholds. 

Proof: It is straightforward to extend the arguments of 
Theorem 1 to show that for a fixed policy of observer 2 optimal 
policies of observer 1 are functions of its posterior belief wj. 
Similarly, the proof of Lemma 1 can be extended to show that 
the value function for observer 1 is minimum of M affine 
functions of the belief, that represent the expected cost of 
stopping and sending one of the M symbols, and 1 concave 
function of the belief that represents the expected cost of 
continuing. Taking minimum of affine and concave functions 
will result in M intervals of the belief space [0, 1] where 
it is optimal to stop and send one of the M symbols. If at 
some time t, the symbols are not ordered in the monotonically 
decreasing way as specified in the result above, one can 
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permute the symbols in policies of Ol and 02 at time t to 
get the desired ordering without losing performance. ■ 

VIII. Extension to Multiple Sensors 

In this section, we extend our results to the case when 
several peripheral sensors similar to observer 1 in Problems 
PI and P2 are required to send a single final message to 
a coordinating sensor (similar to 02) which may be taking 
its own measurements. We show that the peripheral sensors 
have similar parametric characterizations of their optimal 
policies as observer Ol in Problems PI and P2. We obtain 
a characterization of coordinator's strategy that is similar to 
that of 02. 

Consider a group of N peripheral sensors: S1,S2,...,SN and 
a coordinating sensor SO. Each sensor can make repeated 
observations on the random variable H. As before, we assume 
that conditioned on H, the observations at different sensors 
are independent, and the observations made at different time 
instants at any sensor are also independent conditioned on H. 




Fig. 3. Decentralized Detection with N Peripiieral Sensors and 1 Coordinating 
Sensor 



Each of the peripheral sensors observes its own measure- 
ment process Y^, i = 1,2..., N and t = 1,2, .... At any time 

■th 



sensor receives a message it knows which peripheral sensor 
sent that message). At any time t, SO can decide to stop 
and declare a final decision on the hypothesis or take a new 
measurement and wait for more messages from the peripheral 
sensors. Each time SO postpones its decision on the hypothesis, 
it incurs a cost c°. When SO announces a final decision U on 
the hypothesis, it incurs a cost given as J{U,H). Thus, the 
coordinator's decision at time t is given as: 



(40) 



Ut belongs to the set {0, 1,N}, where we use A'' for a null 
decision, that is, a decision to continue waiting for more 
messages and taking more measurements. The sequence of 
functions r° = (71,72, •••) is the policy of the coordinating 
sensor. The time r° is the stopping time when SO announces 
its final decision on the hypothesis, that is. 



T° = min{t : Ut e {0, 1}} 



(41) 



We consider the following problem. 

Problem P3: Consider a finite horizon for the peripheral 
sensors (that is, we require that r' < T') and a finite 
horizon T° for the coordinating sensor, that is, r" < T°. The 
optimization problem is to select polices r°, T^, .., .T^ of all 
the sensors to minimize 



N 



E{J2c'.T' + J{UrO,H)} 



(42) 



2 = 



We now obtain a characterization of the peripheral sensors' 
optimal policies. For the i*'^ peripheral sensor, we define 



(43) 



Theorem 11: For any peripheral sensor i and any fixed 

choice of strategies , for j = 0,1, N, j ^ i, there is 
an optimal policy of the peripheral sensor i of the form: 

1 if irLi < oLi 



if 



> a' 



t, the i peripheral sensor can decide either to stop and where < o;' < 1 and for fc = 1 2 T"* 1 



send a binary message or 1 to the coordinating sensor or 
to continue taking measurements. Each time the i*'* sensor 
decides to continue taking measurements, a cost c* is incurred. 
Each peripheral sensor sends only a single final message to 
the coordinator. The pohcy := (7^,7!, . . .) of i*'' sensor is 
of the form: 

(38) 



if ttI < al 



if 



if PI < 

< 1. 



< 
< t 



where Zl is i sensor's message at time t to the coordinating 
sensor. belongs to the set {0,1,6}, where we use b for 
blank message, that is, no transmission. The time r* is the 

stopping time when i*^ sensor sends a final message to the The i"* sensor plays the role of Ol in Problem P2 and the 



where < al < Pi < < 

Proof: The main idea of the proof is that once the poUcies 
of all sensors except i are fixed, the optimization problem for 
the i*^ sensor is similar to the problem for Ol in Problem P2. 

th 



coordinating sensor, that is. 



T = mm 



{t:Zle {0,1}} 



(39) 



The coordinating sensor observes its own measurement pro- 
cess, Y^, t = 1,2, .... In addition, it receives messages from all 
the peripheral sensors (we assume that when the coordinating 



coordinating sensor plays the role of 02. The observations of 
the coordinating sensor at time t can be defined as: 

{Y^,Zi,j = l,2,...,N,ji^i) 

Note that conditioned on H, the observations Y^ are indepen- 
dent of the i^^ sensor's observations. We can now follow the 
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arguments of Theorem 1 and 2 to conclude the result for the 



■th 



peripheral sensor. 



To find a characterization of the coordinating sensor's 
policy, we fix the policies of all peripheral sensors and define 

C^l-.ti -^litJ '^liti ^l:t) 



:= PiH = 0\Y,%Zlt,Zl,,...,Z^,,) 



(44) 



Theorem 12: For any fixed choice of policies of the pe- 
ripheral sensors, the policy of the coordinating sensor is given 
as 

1 if TTyo < OLxO 



7^0 



if ttx^o > aj^o 



1 ifnl<a,{Zi,,Zl,,...,ZZ,) 
N ifafc(Zi„Z2„...,ZiY,)<^o< 

Pk{Z^.f,,Z^.j., Z-^.f.) 
if7rO>/3fc(Zi„Z2„...,Zi^,) 



where < ak{Zl,, Zl,, Z^,)< 

/3fc(Z|.j., Z^.j., Z^j,) < 1 are thresholds that depend 
on sequence of messages received from the peripheral 
sensors. 

Proof: The proof follows the arguments of Theorem 3 
and Theorem 4. ■ 

IX. Conclusion 

We derived structural properties of optimal policies for two 
observers for a sequential problem in decentralized detection 
with a single, terminal communication from observer 1 to the 
observer 2. It was shown that classical two threshold rules 
no longer hold for observer 1. However, since observer I's 
problem is a stopping time problem, a finite parametric charac- 
terization of optimal policies is still possible and is described 
by at most 4 thresholds. A characterization of observer 2's 
optimal policy was obtained as well. A methodology to find 
the optimal policies in a sequential manner was presented. We 
extended the qualitative results to the infinite-horizon versions 
of the problem, to the problem with increased communication 
alphabet and to a related problem with multiple sensors. In all 
the problems we considered, there is only one message sent 
from observer 1 to 2. It may still be possible to extend the 
scope of communication between agents while still satisfying 
energy and data rate constraints. More general problems where 
there may be active communication from one observer to the 
other even before the stopping time remain to be explored. 

Appendix I 
Proof of Theorem 1 

Consider an arbitrary choice — (71 , 7I, Jti) f^^" 02's 
policy. 02's policy is assumed to be fixed to throughout 
this proof. Note that for a fixed F^, and U^2 are functions 
of 02's observation sequence {Yi,Y2, ...,Y^2) and messages 
received from Ol (Z^, ZK). In other words, a policy of 02 
induces a stopping time function S and an estimate function 
defined for all possible realizations of the observations of 
02 and messages from 01 such that 



(46) 



Also, by a simple appUcation of Bayes' rule, we know that 
tt\j^i can be updated from tt^ and Y^j^-,^. 



'fe+i 



--P{H = Q\Yl 



k+l) 



P{Y,\,\H = 0)nl 



P{Y,\,\H = 0)7ri + P{Y,\,\H = 1)(1 



Thus, we have that 



(47) 



(48) 



where Tk is defined by ( [47] i. 

We will now show that under any policy for Ol, the 
expected future cost at time k for 01 is lower bounded by 
the functions Vk defined in Theorem 1. Consider any policy 
for 01. Under the policies and F^, and for a realization 
yl.f. of Ol's observations till time k, let Wk{yl.k) be observer 
I's expected future cost at time instant k if it has not sent its 
final message before time k. That is. 



Wkivlk) 



E 



[c'-iT'~k) + c^T' + J{U%,H)\yik, 

(49) 



^l:fe-l - ^l:fe-l] 

First consider time T^. We have 

Vyi (tt) :— mini 

W.^'[c\^ + J{U^2,H)\^^^^ = tt, ^i-i = ^i:Ti-i, = 0]. 

E^'[cV2 + J(C/^2,iJ)|4i = 7r,ZiVi_i = &i-Ti-i,^Ti = 1]} 

(50) 

If observer 1 has not sent a final decision before time T^, 
then under policy F^, Ol will either send or 1 at time T^. 
Ol's expected cost to go at T^, if it sends a at time is 

Wriivl-Ti) = w{yl.rpi,0) 

(51) 

Similarly, if Ol sends a 1 at T^, its expected cost to go is 

WTi{yl.,Ti) = w{yl.rpi,l) 

(52) 



Consider the expectation in (51 1. We can write it as 

Er'[cV + J([/2,,ff)|yl^j,,,ZiV_i = &l:Ti-l,Zli =0] 

-^I]^"[c^S^\y1^^2,ZI^,) + J{R^\Yl^2,Zl^,),H)\ylT,, 
Zl,Ti-i = bi;T^-i, Z^i ~ 0] (53) 

^E^' [c'S^" {Yl^. , 6i:Ti-i, 0) + J(i?^' {YIt^ , , 0), 

|yi:Ti,^iV-i - &i:Ti-i,^Ti - 0] (54) 



2 ryl 

T2i ^1:t1 



(45) 



where we used (45 1 and (46 1 in (53 1 and substituted Z^.^i 
in (54 1 with the values specified in the conditioning term 
of the expectation. Since the only random variables left in 
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the expectation in (54i are Y^.j^i and H, we can write this 
expectation as 



/i=0,l 

(55) 



Consider first the term for h = in ( [55| . Because of 
the conditional independence of the observations at the two 
observers, we can write this term as follows: 

E [P{yl.TAH = Q).P{H = Q\yl^^, 

J(i?^'(y?^T^,6l:Tl-l,0),0)}] 

= E iPiyl.T^\H = o)-^TM:T^) 

:T2 7 ^1:T1-1j 0),0)}] 



If observer 1 has not sent a final decision before time k, then it 
will send either a 0, 1 or b at time k. Therefore, Ol's expected 
cost to go at k, Wkiyl./.), either 

Wk{yhk,0) := Ei^'[cV + J{U^,,H)\yl^,Zl,^, = h-.^-i, 
Zl = 0] (60) 

if Zl = 0; or 

Zl = 1] (61) 
if Zl^\- or 

:= c^+^''\Wk+i{y\..,,Y^+M:k,Zl,t, = h,..k] 

(62) 

if Zl = 6. 

By arguments similar to those used at time T^, we can show 



Similarly, the term for h = \ in dSSll can be written as 



(56) 

ritten as, 

E [P{yl.TAH^\).{\^^}rM:T^)) 

.rp2, 0) + J{R^ {yl .rp2 , 6l:Tl-l, o),i)}] 

(57) 



Combining equations ( [56| and ( 57 1, we see that the expectation 
in (55 I depends on tt^i {yl.rpi ) and not on the entire sequence 
yl.rpi- Hence, we can replace yl.rpi by tt^i ) in the 
conditioning in (jSTJ. Therefore, 

wt^ iyl.Ti > 0) 

=E^'[c^T^ + J(U%,H)\yl.p^,Zl^i_, = b,.,Ti-i,Z^i^O] 



Z^i = 0] 



>FTl(7r^l(yl:Ti)) 



(58) 



that 14 is a lower bound to expressions in (60l and (61 1. That 
is, 

Wkiyl:k,4) 

= E^' [cV2 + J{U%,H)\yi,, Zl,_, = b^..k-uZl = zl] 

> Vkinliyik)) (63) 

for zl e {0, 1}. 

Consider equation ( [62| . From the induction hypothesis at time 
fc + 1, we have that 

Wk+i{yl.k+i) > Vk+i{nl+M:k+i)) 
which implies 

I]^'[W,+i{yl.,k,Y^+,)\yl.,„Zl^ = b...^] 

> I]^\Vk+l{nl^M:k,Yk\,))\yl.,,Zl, = 6l:fe] 

= i:^'[Vk+i{Tk{nl{yl,),Y,\,))\yl,,Zl^ - b,..k] (64) 

The above expectation is a function of 7r^(yj^.^) and the 
conditional probabiUty: 

P{Y,}+,\yl^,Zl^ = b^..k) 
which can be expressed as: 

P{Y^\,\H = 0).nl{yl„) + P{Y,\,\H - 1).(1 - nUyl^)) 



where we used the definition of Vj-i in (58 i. Exactly same 
arguments can be used if Ol sends a 1 at time to show 
that 

WTi (yi^Ti , 1) := [cV2 + J{U% , H)\yl.Ti , 

>yri(4i(yi:Tj) (59) 

Hence, we conclude that the following inequality always holds 
for poUcy T^: 

W^Tl(2/l:Ti) > "^Tl(7rTl(yi:Tl)) 

Now consider time k. Assume that 

Wk+i{yl.k+i) > Vk+M+M-.k+i)) 



Thus the expectation in (64i depends only on 7r^(yi.fc) and 



not the entire sequence yl./.'. Hence, it can be written as: 

Er'[n-+i(r,(^i(2/i\fe),r;3+i))|7ri(yi\,),Zi,, = b] 
Equations ( [62| and ( [64| then imply that 

wk{yl.k,b) 

= c'+E^'[Wk+i{yl.k,Y,\„Zl+,)\yl„,Zl, = b,..k] 

> ci +Er'[Vfe+i(Tfe(^i(yi\fe),r^+i))ki(yi\fe),Zi, = b,..k] 
- c'+W:^'[Vk+i{7Tl+,)\7Tl{yl,),Zl^ - 

> Vkinliyi,)) (65) 



where we used the definition of Vk in (65 i. From equations 
(63 1 and (65i, we conclude that the inequality Wkiyl.^) — 
Vk{Trl{yl.k)T true. Hence, by induction it holds for all k — 
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T^,T^ ~ 1, ...,2, 1. Since was arbitrary, we conclude that 
Vk are lower bounds on the expected cost to go for Ol under 
any policy for Ol (with 02's policy fixed at F^). 

A policy F* that always selects the minimizing option 
in the definition of Vk for each vr will achieve the lower 
bounds Vk on Wk with equality for all k. Note that the total 
expected cost of poUcy F^ for Ol is + £'[H^i(Yj^^)] which 
is greater that + ii^[Vi(7r|[(y|[))] (since we have shown that 
Wi{yl) > Vi{TTl{yl))). Thus, we have that F* also achieves 
the lower bound on total expected cost for any policy. Hence, 
it is optimal. 

Thus, an optimal policy is given by selecting the minimizing 
option in the definition of Vk at each tt. This establishes the 
dynamic program of Theorem 1 and shows that there is an 
optimal policy of the form: 



Appendix II 
Proof of Lemma 1 

Consider the first term in definition of V^i • Using functions 
and RF from equations ( 45 1 and ( 46 1 in first term of ( 50 1, 
we get 



where A^i is the factor multiplying tt in (68 i. Note that this 



factor depends only on the choice of 02's policy. Similar 



arguments for the term corresponding to /i = 1 in ( 67 1 show 
that it can be expressed as 



(1 - tt) X bI 



(70) 



Equations (69 1 and (70i imply that first term of (50i is an 
affine function of tt, given as A^i.tt + 5^1.(1 — tt). Similar 
arguments hold for the second term of ( [50| . Hence, we have 
that 

Vti{7t) mm{i^i (tt), L^i (tt)} 

Since V^i is minimum of two affine functions, it is a concave 
function of tt. 

We now proceed inductively. Assume that Vk+i is a concave 
function of tt and consider Vk, 

Vfe(7r) := mini 





[c\^ + J{U^.,H)\7rl^7T,Zl,_,= 


bl:k-l,Zl 


-0], 


E^^ 


[c\' + J{U^.,H)\7Tl^7r,Zl,_,= 


bl:k-l,Zl 


= 1], 




E[Vk+i{Tk{nlY,'^,))\nl^7T,Zl, 


^b]} 


(71) 



E^'icV^ + J(C/2,,iJ)|4i = ^,^Lt1-1 = &1:T1-1, 
Z^l = 0] 

ItT^i = TT, Z\.j,i_^ — 6l:Tl-l; Z}pi — 0] 

Itt^i = TT, Z\.j.i_^ = Z}pi — 0] (66) 



where we substituted Z\.j,x in (661 with the values specified 
in the conditioning term of the expectation. Since the only 
random variables left in the expectation in ( 66 1 are Y^.j,^ and 
H, we can write this expectation as 

E E [ 

{h=o,i} vl^.eyl^^ 

P{yl,x2 , H — h\'K}pi — TT, Z\.j,i_^ — b, Z^i — 0) 

+ J{R^\ylT.,bi.,Ti-i,0),h)}] (67) 

Consider first the term for h — in ( |67] i. Because of 
the conditional independence of the observations at the two 
observers, we can write this term as follows: 

E [^(y'T^I^-0).7r. 
{c'S^\ylT2,b,.,T^_i,0) + JiR^\ylT.,bi.,Ti-i,0),0)}] 

=TTX 

[ P{yl.T2\H^O).{c'S^\ylT.,b,.,T.-uO) 



Repeating the arguments used for V^i, it can be shown that 
first two terms in ( fTTj i are affine functions of tt. These are the 
functions and in Lemma 1. To prove that the third term 
is concave function of tt, we use the induction hypothesis that 
14+1 is a concave function of tt. Then, Vk+i can be written 
as an infimum of affine functions 

Ffe+i(7r) =inf{A,7r + ^,} (72) 

i 

Furthermore, last term in (|7TJ can be written as: 

+ E[Vk+l{Tk{TT,Y,\,))\TTl = TT,Zl^ = b,..k] 

^c'+ [Pr{yl+i\7Tl=TT,Zl, = b,..k).Vk+i{Tk{TT,yl+, 



(73) 



Using the definition of Tk from equation ( |47] i, each term in 
the above summation can be written as 

PHvl+il^l = ^^zIj^ = bi.,k) 

P{yU,\H = 0).TT 



■ Vk 



' Pr{yl^,\H = 0).TT + Priyl^,\H = l).il-TT) 
= {Priyl+,\H = 0).TT + Pr{yl^^H = 1).(1 - vr)} 

Vk+i 



P{yl+,\H = Q).^ 

Pr{yl^,\H = 0).TT + Pr{yl^,\H = 1).(1 - tt) 

(74) 

Now using the characterization of T4+i in terms of the affine 



+ J{R^ (y?.T^,6i:Ti-i,0),0)}] 



-TT X Am 



(68) 
(69) 



functions (from equation [72[ i in the equation ( 74 1, we obtain 

ini{\,.P{yl+,\H^Q).n+ 

{Pr{yl+,\H = 0).n + Pr{yl^^H - 1).(1 - Tr)).^,} (75) 
Substituting this expression in ( [73| ), we obtain 

W\-P{yl+i\H = Q).T^+ 

{Pr{yl+,\H = 0).7r + Pr{yl^,\H = 1).(1 - Tr)).^..}] (76) 
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Observe that the expression under the infimum is an affine 
function of tt. Hence, taking the infimum over i gives a 
concave function of tt for each yl_^_i - Since the sum of concave 
functions is concave, the expression in ( [76| is a concave 
function of tt. We will call this function Gfc(7r). Thus, the 



value function at time k given by (71 1 can be expressed as: 

Vkin) := rmn{Ll{TT),Ll{7r), Gfc(^)} (77) 

Since Vk is minimum of a concave and two affine functions, 
it itself is a concave function. This completes the argument 
for the induction step and ( 77 1 now holds for all k — (T^ — 
1),...,2,1. 

Appendix III 
Proof of Theorem 3 

Proof: Let — (7^,72, ■•■,7'^i) be the fixed policy of 
01. By definition of ttI_^_i, we have 

''■fc+l(^l^fe+l: ^l:k+l) •= = Q\Ylk+n ^l:fc+l) 

_ pjH = 0, yfc\i, zi.k) ^^g^ 

(although we omit the superscript for ease of notation, it 
should be understood that these probabilities are defined with 
a fixed ri.) 

Consider the numerator in (fTSll. It can be written as: 



= P{Y,\,\H = 0).P{Zl+,\H = 0, ZI,).ttUY,^.,„ZI,) 

(79) 

where we used conditional independence of the observations 
in (79 1. Under a fixed policy of Ol, s are well-defined 



random variables and hence the second term in ( [79| l is well- 
defined. Similar expressions can be obtained for the terms in 
the denominator of (78 1. Thus, we have that tt^^i is a function 
of Yk+i and Ztlv That is, 

nl+,^n{nlY^^„Zl^+,) (80) 

In the statement of Theorem 3, we defined Vr^ as 



VT2{zl.Ti,n) := min{E^ [J{0,H)\7r^2 = tt], 
i?^'[J(l,i/)|4. - tt]} 



(81) 



If 02 has not declared a final decision on the hypothesis 
till — 1, and selects Ui^2 = 0, then his future cost at time 

r2 is 

which corresponds to the first term in definition of V^2 at 
T^T^iul-T^^ ^i-T^)- ^ similar expression is true if C/|,2 = 1. 
In either case, we have from the definition of Vp^ that for 

u e {0,1}, 

WT2{yl.rp2,zl.rj.i,u) := [J {u, H)\yl. j,2, z\.j.i] 



=42(y?,T2, 4ti)-^(0,0) + (1 - 
=E^\j{Q,H)\nl2{ylT2,zlp,)] 



thus, the optimal action at time is to select the minimizing 
option in the definition of Vy^ and the optimal future cost is 
the value of Vt^ . 

We will employ backward induction on the functions Vk 
defined in Theorem 3 to show that they represent the optimal 
value functions for 02. Consider time instant k. Assume Vk+i 
gives the optimal cost to go (future cost) function at time fc+1. 
We have, by definition, 

yk{z\.k,TT) := min{ 

E^"[J{Q,H)\nl^^], 

E^"[J{1,H)\itI^^I 

+ E^\Vk+i{Zl,^+^y^+,)\'^l - (84) 

At time k, for a realization y^.j., z\.^ of 02's observations and 
01 's messages, the cost of stopping and declaring a decision 
on the hypothesis at time k is either 

Wu{ylk,zik,Q) := i?^' [J(0, H)\yl^,zik] 



or 



Wu{ylk, 1) i?^' H)\yl.k,zi.k] 



(85) 



(86) 



The expectation in (88 1 depends on 



'l:fc 



rp2 , Zl.rp_ 



)) (83) 



By arguments similar to those at time T^, the above terms are 
the same as the first two terms of Vk{zl.f^, TrKyf.f., ^i-fc))- The 
cost of continuing at time k is 

Wkiylk, 4fe, N) = c^ + E^" [Vk+l{Zl,k+^yk+^)\ylk. 
E^' [Vl.+i(Zi,.+i, ffeCvr^, y,\i, ^, 4,] (87) 

E^ [Vfc+i(4fe> Zl^i^fkinl, Y^^^.zlf,, ^fe+i))|2/L, 4fc] 

(88) 

and 

P^\Y^^i,Zl^^\y'l.j^,z\.j^). This probability can be written 
as: 

P{Y^+,\H = 0).P{Zl+,\H = 0,zlk).7Tl+ 
P{Y^+,\H^l).P{Zl+,\H^l,zl,).{l-nl) (89) 

which depends only on z^.^, and tt^. Thus, the cost of contin- 
uing is the same as 

c^ + E^" [14+1 , ttI^i )\7rl {ylk, z\,^) , zlj 

which corresponds to the last term in the definition of Vk. 
Consequently, the optimal action at time k is to select the 
minimizing option in definition of Vk and the value of Vk is 
the optimal expected cost to go at time k. This completes the 
proof of the assertion of Theorem 3. ■ 

Appendix IV 
Proof of Lemma 2 

Proof: The result of Lemma 2 for time follows from 
the definition of V7-2 since 

E^'[J{0,H)\tt^2 =7r] = 7r.J(0,0) + (l-7r).J(0,l) 
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This corresponds to the line /''(vr). Similarly, 



E^' [J(l, H)\tt'^2 = tt] = TT. J(l, 0) + (1 - n).J{l, 1) 



summation in ( |92) l is concave in n. Hence, the sum is concave 
in TT as well. This establishes the structure of Vk in Lemma 2. 
To complete the induction argument, we only have to note that 



which corresponds to line ^^(Tr). Since, for any realization since Vk is the minimum of 2 affine and one concave function 
of z^.j,!, Vt^- is minimum of two affine functions of tt, it is of tt , it is concave in tt (for each zl^.). ■ 
concave in tt for each z^.^i. 

Assume now that T4+i(-2i fc+ij is concave in tt for each 
-^i /c+i- Then, we can write Vk+i as: 

Vk+iizlk+i,-^) = mf{Xi{zlf,^^).TT + ^i,{zIj,^-^)} (90) 

where Xi{zl.k+i) ™d are real numbers that 

depend on zl.f.^^. Consider the value-function at time k. 



Vkizlk,^) = rmn{E'' [J{0,H)\ttI = tt], 
E^\j{1,H)\tt1^tt], 

+ E^' [Vk^,{Zl,^,,TTl^,)Ul ^ zL.]} ^''^'^^^ 1) 



Appendix V 
Proof of Lemma 4 



Proof: We first prove the second part of the lemma. 
By definition, we have 



(91) 



The first two terms in (91 1 correspond to the affine terms 
and l^. The last term in (91 1 can be written as: 

c2 + [Vk+i{Zl,k+^,TTl^,)\TTl = ^, zl,^]} 

=c^ + E'^^Vk+iiZlk+iM^.Y^+i.Zlk+iWl^ 

""^^+ XI H [-P'"(yfc+i'2fc+ikfc = '^'^fe)- 

yk+i{z\.,k+i,Tk{TT , yfc+i, 4fc+i))] (92) 
We now use the fact that Tk{TT .y\^i, z\.f._^^) is given as 



P{H = h, TT.Vl = 7t\tT^ = TT^, A+1 = l\Zlt 
= P{H = h,T,{TTlY,\l) = = 



(97) 



(93) 



where we used the fact that Ol's belief at time t + 1 is a 
function of its belief at time t and the observation at time 
t + I, that is, tt]j^^ = Tt{TTl ,Y^_^^) (see Appendix A, ([48i. 
The right hand side (RHS) of (97i can further be written as: 



A+1 = l\zlt = h.,t) 



(see equations (jTSjl and (79 1). 

Focusing on one term of the summation in (92 1 and using 
&U[, we can write it as 



1 



TtiTT'Aj)-. 



=,i.P(y,Vi = y\H = h).PiH = = tt', 



^2=^2^A+l = l|^J:t = 6l:t) 
= / lT,i.',y)=.^.P{Yt\l = y\H = h).P{H = h,TTl = tt', 



^T^HMzik+i) 

+ M^(4fc+l)} 



/P(y|+i|i7 = 0).P(4+i|F = 0,4,).^ 



fc+i' ^fe+iFfc 



bl:t) 



(98) 



(^^-^ where we used the fact that if j = bi-t, then the event 



Note that the expression outside the infimum in ( [94) i is the 
same as the denominator in the term multiplying Ki^l.k+i) 
in (|94]). The expression ([94| can now be written as 



{r^ > i + 1} is same as {r^ > i} n {a^ < 7r| < /3j^} and 



hence I?f+i = -Dj.l 
as: 



2^a2. The RHS of (98 1 can be written 



M{X,{zl,^,).P{yl^,\H = 0).P{zl^,\H = 0,zl,).TT 



(95) 



_ lTA^'^y)=^i.P{Y,\i = - h).P{H - 



^Tt{TT',y) = 



Expanding the probability multiplying fii, we can write ( [95| ) 
as 

inf{A.(zi\,.+i).F(2/^+i|iI = 0).P(zi+i|i/ = 0,Zi\,.)-'r 
+l,,izl,k+,).{P{yl+,\H = 0).P{zl+,\H = 0,zl,).TT+ 
P{yl+,\H - l).P(zi+i|i/ = 1, - vr))} (96) ^^^^ expression given by ^ depends on Mb], the thresholds 

For the given zl.^^^ and y^_|_]^, the term in the infimum in ctt,(3f specified by and the observation statistics that are 
( [96] l is affine in tt. Therefore, the expression in (96 1 is concave known a priori. Thus tpt+iih, tt^,tt'^, P't+i = 1) is a function 
in TT. Thus, for the given realization of z^.^,, each term in the of 4>t[b] and •jf. 



,i.P(y,Vi = y\H = h).<j>t[b]{h,TT',TT^ 1) 

(99) 
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Similarly, TT^, TT^, Dj+i = 0) can be written as: Because of ( |106| l, the expression in ( |101| l can now be 

^,^,ihyy,D,^, = 0) expressed as: 

A = l|^L = fcl:t) (107) 



I lj.^(,,,^)^,i.P(y,Vi = y\H = h).P{H - h,n] - tt', 
= TT , Dt.l^2^^2^p2 = 0|Zi^t = rpj^jg ^^jj further be expressed as 



Ti\Dt^Q\Zl.,^b^.,t) 



Dt = l,Z}^h\Zl,,_^ = h,..t-i) 



I lT,(.',,)=.i.P(r,Vi = y\H = h).P{H = h,n} 

Jy.,iT' 



Tt — TT , l^t — — Ol:f j.l(^2<Q2)u(^2>^2) 



Dt = l,Zl = b\Zlt_,^h.,ty 



J lT,^^,^y)^^i.P{Yl^,^y\H^h).<l>tmh,7T\7T^0) P{7tI e Ct\Zl,_, ^ b,.,t_,) ^^^^^ 

r 1 _ I _ . 2 7 ^ ^here := [0, a\) U J^) U 1]. The denominator is 

/ lTt(7r',y)=7ri-P(i^t+i — vW — .0t [oj (/i, TT , TT , 1) function of a mai-ginal disti'ibution of To simplify the 



•l(772<Q2)u(7r2>;32) (100) 



numerator, first note that ?/'t is fixed akeady by the choice of 
decision functions till time t—1. The numerator in ( |108| l can 



The RHS of (100 1 depends only on 4>t[b] and the thresholds therefore be written as: 

ttj, specified by 7(^. This concludes the proof of the second /" 112 

part of the lemma. = / , '^ft^^{■,:' ,y,^t,'^l)=^-^-PiH = h,TTt ^ tt , 



1 

^',A = l,^t' = 6|^iVi = 6i:t-i) 

<^4b](/^,^^7^^1) 



For the first part of the lemma, consider 2 

P{H = h,7Tl^7T\7Tf^n^,D, = l\Zl,^b,.,) (101) -y^^/T..,K,y>.7^)--^^^* -y\^-l^) 
To simpHfy this term, first note that -Pi^t = ^kt = tt^)-P{H = h, tt] = 7r\ 7rj_;^ = 



= = ^i,7r2„, = tt', A = ll^iVi = 

Jy,TT' 



Eh=o,i PiH = h, yl z}\ylt_^, zlt_^) 
The numerator in ( |102[ ) can be written as: 

P{y^\H - 0, j/L-i, 4:t)-Pi^l\H = 0, 2/L-i, 

= 0|y2,_i,ZiVi) Mh,7T\7r',l) (109) 

= PiVtl^ — ^)-P{^t\P^ — ^i:t-i) -^t-iiyi-.t-iT ^i:t-i) The expression in (109l is a function of ipt and the thresholds 

(103) 

specified by •j}. Since \IQ9\ is equal to the numerator of 
where we used the conditional independence of observations it follows from ( |108| ) and ( fTU9| ), and the fact that the 

given H. Thus the numerator in (lU2\ can be evaluated from denomination of ( fT08| is a marginal distribution of i/'t. 



y2,^2_i and P(zi|i? = 0,4t-i)- Similar expression can be cf>t[b]{h,7r\7T^ 1) = Qi(Vt,7i,&) 
obtained for the terms in the denominator of ( |102| i. Therefore, 

we have Similar analysis holds for Dt — and also for 0t[O] and 0t[l]. 

7rUyluzi.t)=TtM^U,yhP{4\H,zlt_,)) (104) ■ 

For zl.j. = bi;t, we have APPENDIX VI 

2/ 2 7 N rf, / 2 2 Tj/ryl ,\TT r^l L \\ PROOF OF THEOREM 6 

~ ,2 2p/i rifj^i _^ Proof: With the appropriate definitions of the information 

= lt-i{TTt-i^yt,P{TTt G Ct|ii,Zi^t_i = States and the proof of Theorem 6 is similar to that 

of Theorem 5. As in the proof of Theorem 5, we proceed 

where Ct ■= [Q,Oi\) U {[5], 5]) U [0],!]. The conditional backward in time. 

probability in the ai-gument of Tt-i is a function of ipt and Consider first the final horizon for Ol: T^. Assume that 

the thresholds specified by 7^. Thus, when z\.t — bi-t, the designer has already specified functions 71,72, ••■,77-1-1 

TT^^f f-TT^ , In ^Qgs for 01 and 7?, 7I, 7ti-i for 02. The designer has to 

t t-U t-i) t ,V^t,7t J select a function to be used by Ol at time in case Ol's 

(since the function 7^ is completely characterized by a set of final message has not been already sent. By Theorem 2, 

thresholds, we use to denote the set of thresholds). this function is characterized by a single threshold a\,x. The 
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expected future cost for the designer is the cost of a Wald 
problem with horizon —T^, if observer 2 has not already 
declared its final decision. Thus, the expected cost for the 
designer is:. 



cost IS given as: 



(110) 



where K'^^~'^^{-) is the cost of using the optimal Wald 
thresholds from onwards with an available time horizon 
of — T^. This cost can be expressed as: 

=J!][K'^ {T:'^i).Dx'^\Z\.rpi_^ — bi-Ti--i, Z^i = 1] 
■ P{Z^i ~ l\Zl.rpi_^ = bi,T^-i) 



■f E[_ftr"^ (Tr'^i).DTi-\Zl.rpi_-^^ = bi.T^_i, Z^i 
P(^Z}px = Q\Z\.rpi_-^ = bi-x^ — i) 



0] 



E[c1(t1 - (< + 1)) + {c2(t2 - t) + J{Ur2,H)} ■ Dt 
\Zl.t = 

= E[{ J(l, F).l,.<„. + J(0, ff).1.2>^.+ 

}.A|^L = 6l:t]+^;+l(^t+l) 



= E[{ J(l, i/).l,.<„2 + J(0, i/).l,2>^2 + 



(113) 
(114) 



where we used the fact that the expectation in ( 113 i depends 



Z}pi = l)].P(7r^i < a\,i\Z\.j,i — 

Jn2 

Z]^i = 0)].i-*(7r^i > a\'i\Z\.j,i — 6i:Ti-i) 
=: /:Ti(0Ti[O],</'Ti[l],^Ti,aTi) (111) 

where we used the fact that the probabilities in the integrals are 
marginals of i/iyi [1] ^j-i [0] respectively and the probabilities 
multiplying the integrals are marginals of t^ti • Using Lemma 
4, we can write \\\\\ as 

= :.FTi(V'Ti,aTO (112) 

Thus the optimization problem for the designer is to select 
ttpi to minimize JF^i (-02-1,0^1). Define 



For a given ipx'^, the function JFJi describes the optimal future 
cost for the designer and the optimizing a^i gives the best 
threshold. 

Proceeding backwards, assume describes the designer's 
future cost from time t + 1. We now consider the designer's 
problem of selecting thresholds , /3f to be used by 02 if it 
received all blank messages from Ol, that is, Zl.^. = bi;t- The 
cost at time t is J(0, H) if observer 2 stops and declares 0, 
J(l, H) if observer 2 declares 1. In case, observer 2 does not 
make a final decision at this point, a cost of is incurred. 
The future cost for the designer will be the optimal cost at 
time t + 1 which is given by JFj*f i(^t+i)- Thus the expected 



on the thresholds af, f3f , and the conditional belief on H, Dt 
and TT^ given Zl.^ — which is a marginal of (ptib]- Thus 
the optimization problem for the designer is to select , (3f 
to minimize CJf(0( [6], a^, Define 

Now consider the designer's problem of selecting thresholds 
a.\ , (3} , 5} , 9} to be used by Ol at time t. The expected future 
cost is "*(''■() if a final message is sent at time t and if 
02 had not already stopped (that is, Dt — 1). In case a blank 
message is sent, the designer will need to choose thresholds 
at time t for 02 and the optimal future cost would be given 
by + Ql{(t)t\b]). The total expected future cost is therefore, 

E[ci(ri -t) + {c\t^ -t) + J{Ur2,H)}Dt 
\Zlt-i - b,.,-i] 

= nK^''\7r^).Dt\Zl = 0,ZL_i - b,..t-i] 
•P(Zi = 0|ZiVi-&i:t-i) 

+ E[i^^'-*(^2)_^^|^l ^ l^zlt^, = 

■PiZl = l\Zlt_,^b,..t-i) 

+ [c' + g*t+i{m)] ■ p{zi = b\zi,t_^ = b,..t.i) 

=nK^"''{7r^).Dt\Zl = 0,Zlt_, = 

•P((5i <7ri <0i|ZiVi = 6i:t-i) 

+ E[K^'-\7T^).Dt\Zl = l,Zlt_, - b,.,-,] 

■P{al <7rl <Pl\Zlt^,^b,..t^,) 

+ [c' + gUMb])-Dt] ■ P{4 e Ct\Zlt_, = 

(115) 



£t{Mo],M^],Mb],i^t,alPl,slel) 

Tt{i^t,alPlSl0l) 



(116) 
(117) 



where, to write (116i, we used the fact that the two expec- 
tations in (115 1 are functions of 4>t[0] and <j)t[l] (this can be 



established using analysis similar to that leading to ( |1 1 ![ )) and 
the probabilities multiplying the three terms are marginals of 
ipt ■ Further, since 0t[O] and 0f[l] are functions of i/jt, we can 



write ( 1 16 1 as ( |1 17[ ). The analysis for time t can be inductively 
repeated for all times. ■ 
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Appendix VII 
Proof of Lemma 5 

The first part of the lemma follows directly from the fact that 
is defined as infimum over a monotonically increasing 
sequence of sets . 

We will now prove the second part of the lemma. Vi°° is 
defined as infimum of the objective over the set of policies 
which contains A^' .VT"^, hence we conclude that 



Vr{zl.,„yl,)<Jm V,^ izl„n^) (118) 

1 ^ — >oo 



Assume that the inequality in ( 118 1 is strict. Then, there exists 



a policy G G .4°° for observer 2 such that the expected cost 
under G, 



2 2 

c r 



is strictly less than Xvai'xi^^V'^ (z^jiTi't)- Therefore, the 
policy G is better than any finite horizon policy. We will 
now construct a sequence of finite horizon policies G^^i , = 
t,t+l,t+2, ... such that the expected cost of Gt^ approaches 
the expected cost of policy G as oo. This will contradict 

the fact that WtiG) < limT2^oo V't^^ (^Li )■ Let r'^ and 
Ut-g be the stopping time and the decision at the stopping 
time induced under policy G. The policy Gt2 is characterized 
by the stopping time r' and the decision at stopping time Ur' 
it induces as follows: 

tG if tG<T2 
T2 if tG>T2 



and 



Ur 



if > r2 



Note that Gt^ is finite horizon policy since it always stops no 
later than the horizon T^. Define 



Wt(GT2) := E 



2 2 

- C T 



J{U^,.,H)\yl,,zi,] 



By assumption, the cost under policy G is better than the 
cost under any finite horizon policy. Therefore, WtiGT^) > 
Wt{G). Moreover, 



+ J(C/,2.,iI)|2/L,4t 



Wt{,GT^)-Wt{G) 
= E^^-^T^ [cV^ + 

= E[c\t' ~ r«) + J{Ur',H) - J{Ura,H)\yl,,zU 
= E[{c^{t' ^t'^)+ 

JiUr',H) - J{Ura,H)}.lra<TAylt,zl,t] 
+ S[{c2(T'-r«)+ 

J{Ur',H) - J{Ura,H)}.l,O^TAylt,4:t] (119) 



The first expectation in equation (119i is since for r'' < 
T^, the policy Gt^ has the same stopping time and the final 



decision as policy G. Thus, we get: 

Wt{GT2) - Wt{G) 

^E[{c^{t' - T^) + J{Ur-,H) - J{Ura,H)}.l,ayT- \ylt,4 
^E[{c^T^ T^) 

+ J(0, H) - JiUrO , i/)}.l,G>T2 \yluzl,t] 
<E[{J{0,H)-JiUrO,H)}.l,a^T^\yl„zl.,] (120) 

<L.E[lrayT-\ylu4:t] 

^L.P{t^ >T^\yl^,zi,) (121) 

, where L is the finite positive constant that upper-bounds 
J{U,H). Since the stopping time under policy G is almost 
surely finite (otherwise cost of policy would be infinite), we 
have that P{t^ > T^\ylt,zl.t) ^ 0, as -> oo. Thus, for 
any e > 0, there exists a horizon large enough for which 
Wt{GT2) - WtiG) < e. Therefore, 

lim Wt(GT2) = Wt(G) 

Hence, we conclude that there does not exist any policy G G 
A°° for which Wt{G) < limT2^oo Vt^\zlt,Tr'i). Therefore, 

Vri4:t.ylt) = W^oo Vt^\zl,,7t^). 



Appendix VIII 
Proof of Lemma 6 

The fist part of the lemma follows directly from the fact that 
Vf^ is defined as infimum over a monotonically increasing 
sequence of sets . 

We will now prove the second part of the lemma. Since 
B°° contains 6^\vr\ we conclude that < 
limyi^oc Vf^ (ttI). Assume that the inequality is strict. Then, 
there exists a policy A e for observer 1 such that the 
expected cost under A, 



Wt{A) ■.= E^'^"[c^T^ 



2 2 

C T ■ 



JiU^.,H)\yl,,] 



is strictly less than limj-i^oo V^^^ i^t)- Therefore, the policy A 
is better than any finite horizon policy. We will now construct 
a sequence of finite horizon policies A^ijT^ = t,t + l,t + 
2, ... such that the expected cost of Ayi approaches the cost 
of policy A as ^ cx). This will contradict the fact that 
WtiA) <]imTi^ooVt^\7rl). 

Let and Z^^ be the stopping time and the decision at 
the stopping time induced under policy A. The policy Ayi 
is characterized by the stopping time t* and the decision at 
stopping time Z^, it induces as follows: 



T 



if < 

if > Ti 



and 



Zl,= 



Z^A if < 
^ if > Ti 



Note that A7-1 is finite horizon policy since it always stops no 
later than the horizon T^. Define 

WtiAri) := i?^-^'i^'[cVi + cV^ + J{U^,,H)\yl„ zl,] 
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By assumption, the cost under policy A is better than cost 
under any finite horizon poHcy. Therefore, 14^4 (A-pi) > Wt(A). 
Moreover, 



Wt{KTi)-Wt{K) 



+ J{Ul.,H)\y\,,] 



E 



A,r^ 



[cVl + cV + J([/2.,i/)|yl^,; 
E[c\T*-r'^)\y\,]+E'^T^'^\^T^ 



E 



A,r^ 



-c\^ + J{Ul.,H)\vU 



+ J{Ul.,H)\y\,,] 
(122) 



J([/2.,if)|ylj 



where we used the fact that since r* 



< T 



(123) 

the first term in 



(122 1 is less than or equal to 0. Further, (123i can be written 
as: 

+ i?^-i'I^'[{cV2 + J{U^,,H)}.l,A^TAyl:t] 

- E""'^" [{cV2 + J(C/2„ |yl J (124) 

For all realizations where < T^, the policy A7-1 has 
the same stopping time and the final decision as policy A and 
hence they both will send the same realization of messages 
to 02 and hence 02's policy will produce the same 
realizations of and ■ This implies that the first two terms 
in (124 1 are equal. Thus, ( |124| i becomes 

EA^i [|^2^2 ^ j^^2^ ^ i/)}.l,A>Ti \yl:t] 

2 rTi2 



<{c'.T' + L).E[l,A^TAyl,] 
{c\T' + L).P{T^>TAyl,) 



(125) 



where we used the fact that is bounded by under policy 
by assumption. Since the stopping time under policy A 
is almost surely finite (otherwise cost of policy would be 
infinite), we have that P{t^ > T'^\yl,t) 0, as 00. 
Thus, from equations ( |122[ )-( [T25] l, we conclude that for any 
e > 0, there exists a horizon large enough such that 
Wt(ATi) - WtiA) < e. Therefore, 

_^lim Wt{KT^) ^ Wt{K) 

Hence, we conclude that there does not exist any policy 
A G i3°° for which Wt(A) < limyi^oo {tt}). Therefore, 



[3] ¥.K.WsiTshney, Distributed Detection and Data Fusion. Springer, 1997. 
[4] R. Radner, "Team decision problems," The Annals of Math. Statistics, 

vol. 33, no. 3, pp. 857-881, Sept. 1962. 
[5] J. N. Tsitsiklis, "Decentralized detection by a large number of sensors," 

Mathematics of Control, Signals and Systems, vol. 1, no. 2, pp. 167-182, 

1988. 

[6] J.-F. Chamberland and V. V. Veeravalli, "Asymptotic results for decen- 
tralized detection in power-constrained wireless sensor networks," IEEE 
Journal on Selected Areas in Communication, vol. 22, no. 6, pp. 1007- 
1015, Aug. 2004. 

[7] R. Ahlswede and I. Csiszar, "Hypothesis testing with communication 
constraints," IEEE Trans, on Info. Theorv, vol. lT-32, no. 4, pp. 533- 
543, 1986. 

[8] V. V. Veeravalli, T. Basar, and H. Poor, "Decentralized sequential 
detection with a fusion center performing the sequential test," IEEE 
Trans. Inform. Theory, vol. 39, pp. 433^42, Mar. 1993. 

[9] D. Teneketzis and Y. C. Ho, "The decentralized wald problem," Infor- 
mation and Computation, 73, pp. 23^4, 1987. 
[10] A. LaVigna, A.M. Makowski, and J.S. Baras, "A continuous-time 
distributed version of the wald's sequential hypothesis testing problem," 
Lecture Notes in Control and Information Sciences, vol. 83, pp. 533- 
543, 1986. 

[II] D. Teneketzis and R Varaiya, "The decentralized quickest detection 
problem," IEEE Trans, on Automatic Control, vol. AC-29, no. 7, pp. 
641-644, July 1984. 

[12] A. Wald, Sequential Analysis. Wiley, New York, 1947. 

[13] Y. C. Ho, "Team decision theory and information structures," in Pro- 
ceedings of the IEEE, vol. 68, no. 6, 1980, pp. 644-654. 

[14] J. N. Tsitsiklis, "On threshold rules in decentralized detection," in 
Proceedings of 25th IEEE Conference of Decision and Control, Dec. 
1986, pp. 232-236. 

[15] H. S. Witsenhausen, "Separation of estimation and control for discrete 
time systems," Proceedings of the IEEE, vol. 59, no. 11, pp. 1557-1566, 
Nov. 1971. 

[16] A. Nayyar and D. Teneketzis, "On the structure of real-time encoders 
and decoders in a multi-terminal communication system," IEEE Trans. 
Info. Theory, submitted. 

[17] J. Walrand and R Varaiya, "Optimal causal coding-decoding problems," 
IEEE Trans. Inf theory; vol. IT-29, no. 6, pp. 814-820, Nov. 1983. 



Acknowledgments 

This research was supported in part by NSF Grant CCR- 
0325571 and NASA Grant NNX06AD47G. 



References 

[1] R. R. Tenney and N. R. Sandell Jr., "Detection with distributed sensors," 
IEEE Trans. Aerospace Electron. Systems, vol. AES-17, no. 4, pp. 501- 
510, July 1981. 

[2] J. N. Tsitsiklis, "Decentralized detection," in Advances in Statistical 
Signal Processing. JAl Press, 1993, pp. 297-344. 



